Skills extract

📦

extract

Name: extract
Author: tavily-ai

Low Risk ⚙️ External commands🌐 Network access📁 Filesystem access🔑 Env variables

Extract Web Content from URLs

Also available from: pbakaus

This skill extracts clean markdown or text content from specific URLs using Tavily's extraction API. Perfect for research, documentation retrieval, and content aggregation without writing custom scraping code.

Supports: Claude Codex Code(CC)

⚠️ 68 Poor

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "extract". Extract content from https://example.com/about

Expected outcome:

## About Example

Welcome to Example.com...

Our Mission

We strive to provide...

Using "extract". Extract information about pricing from https://example.com/pricing and https://example.com/plans

Expected outcome:

## Pricing Information

### Basic Plan - $9/month
- Feature A
- Feature B

### Pro Plan - $29/month
- All Basic features
- Priority support...

Security Audit

Low Risk

v1 • 2/18/2026

Static analysis detected 137 potential issues across external_commands, network, filesystem, and env_access categories. After semantic evaluation, all findings are FALSE POSITIVES - these patterns represent legitimate API extraction functionality. The skill uses standard shell commands (curl, jq) to communicate with Tavily's official API, accesses environment variables for API key authentication, and reads OAuth tokens from the standard MCP auth directory. No malicious behavior, data exfiltration, or command injection vulnerabilities were identified.

Files scanned

369

Lines analyzed

findings

Total audits

Low Risk Issues (4)

scripts/extract.sh:1-167 SKILL.md:13-201

Shell Command Execution Patterns

Static scanner flagged 62 instances of shell command execution (backticks, $() substitutions). These are FALSE POSITIVES - the skill uses standard Unix tools (curl, jq, base64) for legitimate API communication with Tavily's official service. No user input is injected into shell commands without validation.

scripts/extract.sh:4-152 SKILL.md:16-189

Network Request Patterns

Static scanner flagged 33 network access instances including hardcoded URLs. These are FALSE POSITIVES - the skill is designed to make HTTPS API calls to Tavily's official endpoints (api.tavily.com, mcp.tavily.com). Network access is core functionality for web content extraction.

scripts/extract.sh:65-153 SKILL.md:24-181

Environment Variable Access

Static scanner flagged 16 environment variable access instances for TAVILY_API_KEY. These are FALSE POSITIVES - the skill reads API keys from environment variables, which is the standard and secure method for providing credentials to API-based skills. The skill properly handles missing keys by initiating OAuth flow.

scripts/extract.sh:45-163 SKILL.md:13-20

Filesystem Access for OAuth Tokens

Static scanner flagged filesystem access to ~/.mcp-auth/ directory. This is a FALSE POSITIVE - the skill reads OAuth tokens from the standard MCP authentication directory. This is expected behavior for OAuth-based authentication and poses no security risk.

Risk Factors

⚙️ External commands (62)

🌐 Network access (33)

📁 Filesystem access (17)

scripts/extract.sh:45 scripts/extract.sh:17 scripts/extract.sh:26 scripts/extract.sh:32 scripts/extract.sh:50 scripts/extract.sh:60 scripts/extract.sh:98 scripts/extract.sh:98 scripts/extract.sh:115 scripts/extract.sh:116 scripts/extract.sh:128 scripts/extract.sh:134 scripts/extract.sh:163 SKILL.md:13 SKILL.md:20 SKILL.md:13 SKILL.md:20

🔑 Env variables (16)

scripts/extract.sh:65 scripts/extract.sh:66 scripts/extract.sh:69 scripts/extract.sh:94 scripts/extract.sh:109 scripts/extract.sh:120 scripts/extract.sh:123 scripts/extract.sh:153 SKILL.md:24 SKILL.md:57 SKILL.md:69 SKILL.md:93 SKILL.md:137 SKILL.md:150 SKILL.md:167 SKILL.md:181

Audited by: claude

Quality Score

Architecture

100

Maintainability

Content

Community

Security

Spec Compliance

What You Can Build

Research Documentation Gathering

Extract documentation content from multiple API reference pages to build a local knowledge base

Competitive Analysis

Extract content from competitor websites, product pages, and blog posts for market research

Content Aggregation

Pull articles and content from multiple news sources or blogs into a single markdown format

Try These Prompts

Basic URL Extraction

Extract the content from this URL: https://example.com/article

Multiple URLs Extraction

Extract content from these URLs: https://docs.example.com/api, https://docs.example.com/auth

Query-Focused Extraction

Extract information about authentication from these URLs: https://example.com/docs, https://example.com/api-reference. Focus on API keys and OAuth.

Advanced Extraction for Dynamic Pages

Extract all content from this JavaScript-heavy page using advanced extraction: https://app.example.com/dashboard

Best Practices

Use the query parameter to filter content to exactly what you need, especially when extracting from large pages
Start with basic extraction and only use advanced mode if content is missing or incomplete
Batch URLs by topic or category to keep results organized and relevant

Avoid

Extracting more than 20 URLs in a single request will fail
Using chunks_per_source without a query parameter will return an error
Not checking the failed_results field in the response may miss extraction failures

Frequently Asked Questions

Do I need a Tavily API key?

Yes, you need either a Tavily API key or an existing Tavily account for OAuth authentication. Get an API key at tavily.com or sign up for an account.

How many URLs can I extract at once?

You can extract up to 20 URLs per request. For larger batches, split into multiple requests.

What is the difference between basic and advanced extraction?

Basic extraction is faster and works for static HTML pages. Advanced extraction handles JavaScript-rendered pages, complex layouts, and structured data but takes longer.

How does the query parameter work?

The query parameter reranks extracted content chunks by relevance to your search terms. Use it with chunks_per_source to get the most relevant sections.

Why am I getting failed_results?

Failed results occur when URLs are unreachable, blocked, or timeout. Check the failed_results array in the response for specific error information.

Can I extract content from password-protected pages?

No, this skill cannot extract content from pages that require login or authentication beyond what's publicly accessible.

Developer Details

Author

tavily-ai

License

MIT

Repository

https://github.com/tavily-ai/skills/tree/main/skills/tavily/extract/

Ref

main

File structure

📁 scripts/

📄 extract.sh

📄 SKILL.md