scanlume-ocr-api
Extract text and tables from images with OCR
Manual text extraction from images and screenshots is slow and error-prone. This skill automates OCR processing through the Scanlume API, delivering structured text, Markdown, HTML, or table data from image files.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "scanlume-ocr-api". A screenshot of a product page showing a title, description paragraph, and pricing table
Expected outcome:
- Simple mode: Returns raw text with the title, description text, and pricing data in a flat format.
- Formatted mode: Returns Markdown with heading structure, paragraph text, and an HTML table representation of the pricing data. The response includes blocks with type information (h1, p, table) and a tableSummary showing tableCount and recordCount.
Using "scanlume-ocr-api". A photograph of a printed invoice containing a multi-row table with columns for item, quantity, and price
Expected outcome:
- Formatted mode extracts each table cell with position data (rowStart, colStart), text content, and header flags. The records array maps each row with field names and values. A tableSummary reports the number of tables and records found.
Security Audit
Low RiskThe static analyzer reported 168 potential issues with a risk score of 100/100. After evaluation, the vast majority are false positives. The 116 'Ruby/shell backtick execution' findings result from the scanner confusing markdown code fences (```) in documentation files with actual shell execution. The 'Weak cryptographic algorithm' findings flag base64 encoding used for API data URL construction, which is not cryptographic. 'Hardcoded URL' findings reference the skill's own documented API endpoints (api.scanlume.com, www.scanlume.com). The critical heuristic flag for 'Code execution + Network + Credential access' describes expected behavior for a legitimate API client skill. Real risk factors are limited to standard API client patterns: network requests to the Scanlume API, environment variable access for the SCANLUME_API_KEY, and execution of a bundled Python helper script. No malicious intent, credential exfiltration, or prompt injection attempts were detected.
Low Risk Issues (4)
Risk Factors
🌐 Network access (4)
⚙️ External commands (2)
Quality Score
What You Can Build
Extract text from screenshot archives
Process a collection of screenshots to extract embedded text for documentation, bug reports, or knowledge management. Use simple mode for speed or formatted mode for structured output.
Convert table images to structured data
Extract table data from images of financial reports, invoices, or spreadsheets into Markdown or HTML format that can be edited and analyzed further.
Digitize printed documents
Convert photographs or scans of printed documents into searchable, editable text. Useful for archiving and making printed content accessible.
Try These Prompts
Extract all text from this image at <image_path> using the Scanlume OCR API in simple mode. Return the raw text output.
Use the Scanlume OCR API in formatted mode to process this image at <image_path>. Return the Markdown result so I can see headings, paragraphs, and structure.
Process this image at <image_path> with formatted mode to extract table data. Return the JSON response and summarize the table count, row groups, and record count from the tableSummary.
Run the Python helper script to call the Scanlume OCR API for the local file at <image_path> in formatted mode, outputting Markdown. Show the command and explain each argument.
Best Practices
- Confirm the input is an image file (JPG, PNG) and not a PDF before calling the API. The public API does not support PDF OCR.
- Use simple mode for raw text extraction when speed and cost are priorities. Use formatted mode when document structure, tables, or rich output formats are needed.
- Set the SCANLUME_API_KEY environment variable rather than passing it on the command line to avoid exposing credentials in shell history.
Avoid
- Do not use simple mode when the image contains tables or structured layout. Formatted mode provides the table-aware output needed for structured data extraction.
- Do not assume remote URL support is available. The public API only accepts base64 data URLs in the request payload.
- Do not present the PDF OCR route as publicly available. It is currently behind a beta waitlist and not generally available.