📦

pdf

Name: pdf
Author: ZhanlinCui

低風險 📁 檔案系統存取⚙️ 外部命令

以程式設計方式操作 PDF 文件並填寫表單

也可從以下取得: ArtemisAI,sickn33,Azeem-2,92Bilal26,92Bilal26,anthropics,AutumnsGrove,DYAI2025,K-Dense-AI,davila7,Cam10001110101,ComposioHQ

PDF 處理任務需要專用工具進行擷取、操作和表單填寫。此技能提供使用 Python 函式庫和命令列工具的綜合 PDF 處理功能。

支援: Claude Codex Code(CC)

🥉 75 青銅

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「pdf」。從 document.pdf 擷取文字

預期結果:

成功從 5 頁擷取 2,450 個字元。已識別關鍵章節：執行摘要、財務資料、結論。

正在使用「pdf」。合併 file1.pdf、file2.pdf、file3.pdf

預期結果:

已建立 merged.pdf（共 15 頁），合併內容：file1.pdf（3 頁）、file2.pdf（7 頁）、file3.pdf（5 頁）

正在使用「pdf」。用 field_values.json 填寫 form.pdf

預期結果:

已填寫 2 頁上的 12 個表單欄位。輸出已儲存至 form_filled.pdf，包含已驗證的欄位值。

安全審計

低風險

v1 • 2/24/2026

Static analysis flagged 217 potential issues, but most are false positives from markdown documentation files. External command detections are code examples in backticks (markdown formatting), not actual shell execution. Filesystem operations in Python scripts are legitimate PDF/JSON processing with user-provided paths. No confirmed malicious patterns detected.

已掃描檔案

1,878

分析行數

發現項

審計總數

中風險問題 (1)

forms.md:4 reference.md:11 SKILL.md:15

External Command Execution Patterns

Static analysis detected shell command patterns in documentation files. These are markdown code examples demonstrating command-line tool usage (qpdf, pdftotext, pdfimages), not actual executable code. All commands are intended for user reference only.

低風險問題 (2)

scripts/extract_form_field_info.py:143 scripts/fill_fillable_fields.py:55 scripts/fill_pdf_form_with_annotations.py:93

Filesystem Write Operations

Python scripts perform file write operations for PDF output and JSON data. All file paths are provided as command-line arguments by the user, with no hardcoded paths or unauthorized file access.

scripts/extract_form_field_info.py:32 scripts/extract_form_field_info.py:81

Hardcoded Documentation URLs

Scripts contain hardcoded URLs pointing to PDF specification documentation (Adobe, WestHealth). These are reference links for developers, not network exfiltration endpoints.

風險因素

📁 檔案系統存取 (4)

reference.md:59 scripts/extract_form_field_info.py:143 scripts/fill_fillable_fields.py:55 scripts/fill_pdf_form_with_annotations.py:93

⚙️ 外部命令 (5)

forms.md:4 reference.md:11 SKILL.md:15 scripts/check_bounding_boxes.py:6 scripts/extract_form_field_info.py:11

審計者: claude

品質評分

架構

100

可維護性

內容

社群

安全

規範符合性

你能建構什麼

從 PDF 報告擷取資料

自動從財務或科學 PDF 報告中擷取文字內容和表格以進行資料分析

填寫 PDF 申請表單

以程式設計方式使用使用者提供的資料和驗證來完成可填寫的 PDF 表單

批次 PDF 文件處理

在自動化工作流程中合併、分割、旋轉和為多個 PDF 文件新增浮水印

試試這些提示

從 PDF 擷取文字

從附件的 PDF 文件中擷取所有文字內容並摘要關鍵資訊。

合併多個 PDF

將這些 PDF 檔案合併為單一文件，順序為：cover.pdf、chapter1.pdf、chapter2.pdf、appendix.pdf

用使用者資料填寫 PDF 表單

我需要填寫這份申請表。請先擷取欄位資訊，然後我會提供要填入每個欄位的值。

擷取表格並轉換為 Excel

從這份財務報告 PDF 中擷取所有表格，並將它們儲存為 Excel 試算表，每個表格使用單獨的工作表

最佳實務

填寫前務必驗證 PDF 表單欄位值以防止錯誤
將 PDF 轉換為影像進行 OCR 時使用高解析度設定（300+ DPI）
在非可填寫的 PDF 上新增註解時檢查邊界框交集

避免

填寫非可填寫表單時不要跳過邊界框的視覺驗證
避免在解密前處理受密碼保護的 PDF
不要假設所有 PDF 都有可擷取的文字 - 掃描的 PDF 需要 OCR

常見問題

我需要安裝哪些 Python 函式庫？

核心函式庫：pypdf、pdfplumber、reportlab。選用：pytesseract 用於 OCR，pdf2image 用於 PDF 轉影像。安裝方式：pip install pypdf pdfplumber reportlab

我該如何處理沒有文字的掃描 PDF？

掃描的 PDF 需要 OCR 處理。使用 pytesseract 工作流程：用 pdf2image 將 PDF 頁面轉換為影像，然後套用 pytesseract.image_to_string() 來擷取文字。

這個技能可以填寫不可填寫的 PDF 表單嗎？

可以，方法是透過在計算的位置新增文字註解。流程包括將 PDF 轉換為影像、視覺化識別欄位位置、建立邊界框，以及新增註解。

有哪些命令列工具可用？

poppler-utils（pdftotext、pdfimages、pdftoppm）、用於合併/分割的 qpdf，以及用於進階操作的 pdftk。請透過系統套件管理員安裝 poppler-utils 和 qpdf。

我該如何從 PDF 擷取表格？

使用 pdfplumber 的 extract_tables() 方法。對於複雜表格，請使用 vertical_strategy 和 horizontal_strategy 參數配置 table_settings 以獲得更好的偵測效果。

我可以處理加密的 PDF 嗎？

可以，如果您有密碼的話。使用 pypdf 的 decrypt() 方法或 qpdf --password 選項。沒有密碼則無法處理加密的 PDF。

開發者詳情

作者

ZhanlinCui

授權

Proprietary. LICENSE.txt has complete terms

儲存庫

https://github.com/ZhanlinCui/Ultimate-Agent-Skills-Collection/tree/main/pdf

引用

main

檔案結構

📁 scripts/

📄 check_bounding_boxes_test.py

📄 check_bounding_boxes.py

📄 check_fillable_fields.py

📄 convert_pdf_to_images.py

📄 create_validation_image.py

📄 extract_form_field_info.py

📄 fill_fillable_fields.py

📄 fill_pdf_form_with_annotations.py

📄 forms.md

📄 reference.md

📄 SKILL.md