Skills pdf-analyze

📄

pdf-analyze

Name: pdf-analyze
Author: 21pounder

Low Risk

Process PDF documents for extraction and form filling

Also available from: 21pounder

PDF documents often contain important data that is difficult to access programmatically. This skill provides Claude with comprehensive tools to extract text and tables, fill forms, merge documents, and convert PDFs to images for analysis.

Supports: Claude Codex Code(CC)

📊 71 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "pdf-analyze". Extract text from report.pdf and list all tables found

Expected outcome:

Extracted 24 pages from report.pdf
Found 5 tables:
- Table 1: 'Revenue by Region' (page 3)
- Table 2: 'Q4 Performance Metrics' (page 7)
- Table 3: 'Customer Demographics' (page 12)
- Table 4: 'Year-over-Year Growth' (page 18)
- Table 5: 'Projected 2025 Targets' (page 22)
Saved extracted_text.txt (45 KB) and tables to tables_20250110.xlsx

Security Audit

Low Risk

v3 • 1/10/2026

This is a legitimate PDF processing toolkit containing utility scripts for extracting text, filling forms, and manipulating documents. All code uses standard Python PDF libraries with no network access, no credential theft patterns, and no obfuscation. The skill's behavior aligns with its stated purpose.

Files scanned

1,492

Lines analyzed

findings

Total audits

No security issues found

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

Security

Spec Compliance

What You Can Build

Extract tables from reports

Pull structured data from financial reports, research papers, and statistical documents into CSV or Excel format.

Automate form completion

Fill out PDF forms programmatically with validated data for applications, surveys, and official documents.

Build PDF processing workflows

Create document processing pipelines that merge, split, and transform PDFs for applications and services.

Try These Prompts

Extract PDF text

Extract all text from document.pdf using pdfplumber and save it to extracted_text.txt

List form fields

Check if application_form.pdf has fillable form fields, and if so, list all field names and types

Extract tables

Extract all tables from quarterly_report.pdf and save them to an Excel file with one sheet per table

Fill PDF form

Fill in the following fields in application_form.pdf using data from field_values.json and save to completed_form.pdf

Best Practices

Validate form field values before submission to catch errors early
Convert PDF to images first when working with non-fillable forms to visually verify annotation placement
Use the bounding box validation script to ensure annotations do not overlap or obscure existing content

Avoid

Skipping the form field validation step before filling PDFs
Not converting non-fillable PDFs to images for visual analysis first
Using hardcoded file paths instead of parameters for reusability

Frequently Asked Questions

Which Python libraries does this skill use?

Primary libraries are pypdf for basic operations, pdfplumber for text and table extraction, and reportlab for creating new PDFs.

What are the system requirements?

Requires Python 3.8+ with pip install of pypdf, pdfplumber, reportlab, pdf2image, and PIL. Poppler must be installed for PDF to image conversion.

How do I fill a scanned PDF that is not fillable?

Use the non-fillable form workflow: convert PDF to images, manually determine text entry locations, create fields.json with bounding boxes, then use fill_pdf_form_with_annotations.py.

Is my data safe when processing PDFs?

Yes. All processing is local using Python libraries. No data is sent to external servers. Files are only read from and written to paths you specify.

Why does my filled PDF show annotations in the wrong position?

This usually indicates incorrect coordinate transformation. PDF coordinates start from bottom-left while image coordinates start from top-left. Verify your bounding box conversion logic.

How is this different from using pdf-lib in JavaScript?

The Python tools provide more mature text extraction and table parsing. pdf-lib is better suited for browser environments or Node.js projects that need to create or modify PDFs client-side.