# PDF 表单处理指南
生产环境下处理 PDF 表单的完整指南。
## 目录
- 表单分析与字段检测
- 表单填充工作流
- 验证策略
- 字段类型与处理
- 多页表单
- 扁平化与最终化
- 错误处理模式
- 生产环境示例
## 表单分析
### 分析表单结构
使用 `analyze_form.py` 提取完整的表单信息:
bash
python scripts/analyze_form.py application.pdf --output schema.json
输出格式:
{
"full_name": {
"type": "text",
"required": true,
"max_length": 100,
"x": 120.5,
"y": 450.2,
"width": 300,
"height": 20
},
"date_of_birth": {
"type": "text",
"required": true,
"format": "MM/DD/YYYY",
"x": 120.5,
"y": 400.8,
"width": 150,
"height": 20
},
"email_newsletter": {
"type": "checkbox",
"required": false,
"x": 120.5,
"y": 350.4,
"width": 15,
"height": 15
},
"preferred_contact": {
"type": "radio",
"required": true,
"options": ["email", "phone", "mail"],
"x": 120.5,
"y": 300.0,
"width": 200,
"height": 60
}
}
### 程序化分析
python
from pypdf import PdfReader
reader = PdfReader("form.pdf")
fields = reader.get_fields()
for field_name, field_info in fields.items():
print(f"字段: {field_name}")
print(f" 类型: {field_info.get('/FT')}")
print(f" 值: {field_info.get('/V')}")
print(f" 标志: {field_info.get('/Ff', 0)}")
print()
## 表单填充工作流
### 基础工作流
bash
# 1. 分析表单
python scripts/analyze_form.py template.pdf --output schema.json
# 2. 准备数据
cat > data.json << EOF
{
"full_name": "John Doe",
"date_of_birth": "01/15/1990",
"email": "
[email protected]",
"email_newsletter": true,
"preferred_contact": "email"
}
EOF
# 3. 验证数据
python scripts/validate_form.py data.json schema.json
# 4. 填充表单
python scripts/fill_form.py template.pdf data.json filled.pdf
# 5. 扁平化(可选 - 使字段不可编辑)
python scripts/flatten_form.py filled.pdf final.pdf
### 程序化填充
python
from pypdf import PdfReader, PdfWriter
reader = PdfReader("template.pdf")
writer = PdfWriter()
# 克隆所有页面
for page in reader.pages:
writer.add_page(page)
# 填充表单字段
writer.update_page_form_field_values(
writer.pages[0],
{
"full_name": "John Doe",
"date_of_birt