Skills extract
📦

extract

Low Risk ⚙️ External commands🌐 Network access📁 Filesystem access🔑 Env variables

Extract Web Content from URLs

This skill extracts clean markdown or text content from specific URLs using Tavily's extraction API. Perfect for research, documentation retrieval, and content aggregation without writing custom scraping code.

Supports: Claude Codex Code(CC)
🥉 72 Bronze
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "extract". Extract content from https://example.com/about

Expected outcome:

## About Example

Welcome to Example.com...

Our Mission

We strive to provide...

Using "extract". Extract information about pricing from https://example.com/pricing and https://example.com/plans

Expected outcome:

## Pricing Information

### Basic Plan - $9/month
- Feature A
- Feature B

### Pro Plan - $29/month
- All Basic features
- Priority support...

Security Audit

Low Risk
v1 • 2/18/2026

Static analysis detected 137 potential issues across external_commands, network, filesystem, and env_access categories. After semantic evaluation, all findings are FALSE POSITIVES - these patterns represent legitimate API extraction functionality. The skill uses standard shell commands (curl, jq) to communicate with Tavily's official API, accesses environment variables for API key authentication, and reads OAuth tokens from the standard MCP auth directory. No malicious behavior, data exfiltration, or command injection vulnerabilities were identified.

2
Files scanned
369
Lines analyzed
8
findings
1
Total audits
Low Risk Issues (4)
Shell Command Execution Patterns
Static scanner flagged 62 instances of shell command execution (backticks, $() substitutions). These are FALSE POSITIVES - the skill uses standard Unix tools (curl, jq, base64) for legitimate API communication with Tavily's official service. No user input is injected into shell commands without validation.
Network Request Patterns
Static scanner flagged 33 network access instances including hardcoded URLs. These are FALSE POSITIVES - the skill is designed to make HTTPS API calls to Tavily's official endpoints (api.tavily.com, mcp.tavily.com). Network access is core functionality for web content extraction.
Environment Variable Access
Static scanner flagged 16 environment variable access instances for TAVILY_API_KEY. These are FALSE POSITIVES - the skill reads API keys from environment variables, which is the standard and secure method for providing credentials to API-based skills. The skill properly handles missing keys by initiating OAuth flow.
Filesystem Access for OAuth Tokens
Static scanner flagged filesystem access to ~/.mcp-auth/ directory. This is a FALSE POSITIVE - the skill reads OAuth tokens from the standard MCP authentication directory. This is expected behavior for OAuth-based authentication and poses no security risk.

Risk Factors

⚙️ External commands (62)
🌐 Network access (33)
📁 Filesystem access (17)
🔑 Env variables (16)
Audited by: claude

Quality Score

45
Architecture
100
Maintainability
87
Content
50
Community
82
Security
91
Spec Compliance

What You Can Build

Research Documentation Gathering

Extract documentation content from multiple API reference pages to build a local knowledge base

Competitive Analysis

Extract content from competitor websites, product pages, and blog posts for market research

Content Aggregation

Pull articles and content from multiple news sources or blogs into a single markdown format

Try These Prompts

Basic URL Extraction
Extract the content from this URL: https://example.com/article
Multiple URLs Extraction
Extract content from these URLs: https://docs.example.com/api, https://docs.example.com/auth
Query-Focused Extraction
Extract information about authentication from these URLs: https://example.com/docs, https://example.com/api-reference. Focus on API keys and OAuth.
Advanced Extraction for Dynamic Pages
Extract all content from this JavaScript-heavy page using advanced extraction: https://app.example.com/dashboard

Best Practices

  • Use the query parameter to filter content to exactly what you need, especially when extracting from large pages
  • Start with basic extraction and only use advanced mode if content is missing or incomplete
  • Batch URLs by topic or category to keep results organized and relevant

Avoid

  • Extracting more than 20 URLs in a single request will fail
  • Using chunks_per_source without a query parameter will return an error
  • Not checking the failed_results field in the response may miss extraction failures

Frequently Asked Questions

Do I need a Tavily API key?
Yes, you need either a Tavily API key or an existing Tavily account for OAuth authentication. Get an API key at tavily.com or sign up for an account.
How many URLs can I extract at once?
You can extract up to 20 URLs per request. For larger batches, split into multiple requests.
What is the difference between basic and advanced extraction?
Basic extraction is faster and works for static HTML pages. Advanced extraction handles JavaScript-rendered pages, complex layouts, and structured data but takes longer.
How does the query parameter work?
The query parameter reranks extracted content chunks by relevance to your search terms. Use it with chunks_per_source to get the most relevant sections.
Why am I getting failed_results?
Failed results occur when URLs are unreachable, blocked, or timeout. Check the failed_results array in the response for specific error information.
Can I extract content from password-protected pages?
No, this skill cannot extract content from pages that require login or authentication beyond what's publicly accessible.

Developer Details

File structure

📁 scripts/

📄 extract.sh

📄 SKILL.md