Skills scanlume-ocr-api

📦

scanlume-ocr-api

Name: scanlume-ocr-api
Author: daanaagua

Low Risk 🌐 Network access⚙️ External commands🔑 Env variables

Extract text and tables from images with OCR

Manual text extraction from images and screenshots is slow and error-prone. This skill automates OCR processing through the Scanlume API, delivering structured text, Markdown, HTML, or table data from image files.

Supports: Claude Codex Code(CC)

🥉 73 Bronze

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "scanlume-ocr-api". A screenshot of a product page showing a title, description paragraph, and pricing table

Expected outcome:

Simple mode: Returns raw text with the title, description text, and pricing data in a flat format.
Formatted mode: Returns Markdown with heading structure, paragraph text, and an HTML table representation of the pricing data. The response includes blocks with type information (h1, p, table) and a tableSummary showing tableCount and recordCount.

Using "scanlume-ocr-api". A photograph of a printed invoice containing a multi-row table with columns for item, quantity, and price

Expected outcome:

Formatted mode extracts each table cell with position data (rowStart, colStart), text content, and header flags. The records array maps each row with field names and values. A tableSummary reports the number of tables and records found.

Security Audit

Low Risk

v1 • 4/16/2026

The static analyzer reported 168 potential issues with a risk score of 100/100. After evaluation, the vast majority are false positives. The 116 'Ruby/shell backtick execution' findings result from the scanner confusing markdown code fences (```) in documentation files with actual shell execution. The 'Weak cryptographic algorithm' findings flag base64 encoding used for API data URL construction, which is not cryptographic. 'Hardcoded URL' findings reference the skill's own documented API endpoints (api.scanlume.com, www.scanlume.com). The critical heuristic flag for 'Code execution + Network + Credential access' describes expected behavior for a legitimate API client skill. Real risk factors are limited to standard API client patterns: network requests to the Scanlume API, environment variable access for the SCANLUME_API_KEY, and execution of a bundled Python helper script. No malicious intent, credential exfiltration, or prompt injection attempts were detected.

Files scanned

667

Lines analyzed

findings

Total audits

Low Risk Issues (4)

scripts/scanlume_ocr.py:70-98

Network requests to external API

The Python helper script makes HTTP POST requests to https://api.scanlume.com/v1/api/ocr. This is expected behavior for an API client skill. The API key is transmitted in the Authorization header over HTTPS.

scripts/scanlume_ocr.py:25-32

Environment variable access for API key

The script reads SCANLUME_API_KEY from the environment. This is the recommended pattern for API key handling. No credentials are hardcoded in the source files.

scripts/scanlume_ocr.py:161-165

Local filesystem read access

The helper script reads arbitrary local image files specified by the user to build base64 data URLs. File existence and type checks are performed, but any readable file path could be processed.

SKILL.md:52-53

External command execution via Python script

The skill documentation instructs users to run a bundled Python script via 'python scripts/scanlume_ocr.py'. This is a documented helper for calling the OCR API from local files. The script uses only the Python standard library with no subprocess or shell execution.

Risk Factors

🌐 Network access (4)

scripts/scanlume_ocr.py:13 scripts/scanlume_ocr.py:70-98 scripts/scanlume_ocr.py:174 SKILL.md:8-10

⚙️ External commands (2)

SKILL.md:50-53 SKILL.md:101-104

🔑 Env variables (4)

scripts/scanlume_ocr.py:21-22 scripts/scanlume_ocr.py:25-32 scripts/scanlume_ocr.py:167 SKILL.md:27

Audited by: claude

Quality Score

Architecture

100

Maintainability

Content

Community

Security

Spec Compliance

What You Can Build

Extract text from screenshot archives

Process a collection of screenshots to extract embedded text for documentation, bug reports, or knowledge management. Use simple mode for speed or formatted mode for structured output.

Convert table images to structured data

Extract table data from images of financial reports, invoices, or spreadsheets into Markdown or HTML format that can be edited and analyzed further.

Digitize printed documents

Convert photographs or scans of printed documents into searchable, editable text. Useful for archiving and making printed content accessible.

Try These Prompts

Quick text extraction from an image

Extract all text from this image at <image_path> using the Scanlume OCR API in simple mode. Return the raw text output.

Formatted extraction with Markdown output

Use the Scanlume OCR API in formatted mode to process this image at <image_path>. Return the Markdown result so I can see headings, paragraphs, and structure.

Table extraction from a financial report image

Process this image at <image_path> with formatted mode to extract table data. Return the JSON response and summarize the table count, row groups, and record count from the tableSummary.

Batch OCR with the Python helper script

Run the Python helper script to call the Scanlume OCR API for the local file at <image_path> in formatted mode, outputting Markdown. Show the command and explain each argument.

Best Practices

Confirm the input is an image file (JPG, PNG) and not a PDF before calling the API. The public API does not support PDF OCR.
Use simple mode for raw text extraction when speed and cost are priorities. Use formatted mode when document structure, tables, or rich output formats are needed.
Set the SCANLUME_API_KEY environment variable rather than passing it on the command line to avoid exposing credentials in shell history.

Avoid

Do not use simple mode when the image contains tables or structured layout. Formatted mode provides the table-aware output needed for structured data extraction.
Do not assume remote URL support is available. The public API only accepts base64 data URLs in the request payload.
Do not present the PDF OCR route as publicly available. It is currently behind a beta waitlist and not generally available.

Frequently Asked Questions

What image formats does the Scanlume OCR API support?

The public API accepts image files such as JPG and PNG. PDF files are not supported through the public API at this time. PDF OCR is available on the website but remains beta-gated.

What is the difference between simple and formatted mode?

Simple mode returns plain text and costs 1 credit per image. Formatted mode returns structured output including Markdown, HTML, and table data, and costs 2 credits per image. Use formatted mode when document structure or table extraction is needed.

How do I get an API key for the Scanlume OCR API?

You need a SCANLUME_API_KEY to authenticate with the API. Set it as an environment variable or pass it via the --api-key argument to the helper script.

Can I use remote image URLs with this API?

No. The public API does not support remote file URLs. You must provide the image as a base64 data URL. The helper script handles converting local files to base64 data URLs automatically.

How do I run OCR from the command line?

Use the bundled Python script: python scripts/scanlume_ocr.py <path> --mode formatted --output md. Set the SCANLUME_API_KEY environment variable first, or pass --api-key directly.

Is PDF OCR available through this skill?

No. The public v1 API covers image OCR only. PDF OCR is available on the Scanlume website but the public API route is beta-gated and not generally available.