技能 extract

📦

extract

Name: extract
Author: tavily-ai

低风险 ⚙️ 外部命令🌐 网络访问📁 文件系统访问🔑 环境变量

从 URL 提取网页内容

也可从以下获取: pbakaus

此技能使用 Tavily 的提取 API 从特定 URL 提取干净的 markdown 或文本内容。非常适合研究、文档检索和内容聚合，无需编写自定义爬虫代码。

支持: Claude Codex Code(CC)

⚠️ 68 差

下载技能 ZIP

在 Claude 中上传

前往设置 → 功能 → 技能 → 上传技能

开启并开始使用

测试它

正在使用“extract”。 Extract content from https://example.com/about

预期结果:

## About Example

Welcome to Example.com...

Our Mission

We strive to provide...

正在使用“extract”。 Extract information about pricing from https://example.com/pricing and https://example.com/plans

预期结果:

## Pricing Information

### Basic Plan - $9/month
- Feature A
- Feature B

### Pro Plan - $29/month
- All Basic features
- Priority support...

安全审计

低风险

v1 • 2/18/2026

Static analysis detected 137 potential issues across external_commands, network, filesystem, and env_access categories. After semantic evaluation, all findings are FALSE POSITIVES - these patterns represent legitimate API extraction functionality. The skill uses standard shell commands (curl, jq) to communicate with Tavily's official API, accesses environment variables for API key authentication, and reads OAuth tokens from the standard MCP auth directory. No malicious behavior, data exfiltration, or command injection vulnerabilities were identified.

已扫描文件

369

分析行数

发现项

审计总数

低风险问题 (4)

scripts/extract.sh:1-167 SKILL.md:13-201

Shell Command Execution Patterns

Static scanner flagged 62 instances of shell command execution (backticks, $() substitutions). These are FALSE POSITIVES - the skill uses standard Unix tools (curl, jq, base64) for legitimate API communication with Tavily's official service. No user input is injected into shell commands without validation.

scripts/extract.sh:4-152 SKILL.md:16-189

Network Request Patterns

Static scanner flagged 33 network access instances including hardcoded URLs. These are FALSE POSITIVES - the skill is designed to make HTTPS API calls to Tavily's official endpoints (api.tavily.com, mcp.tavily.com). Network access is core functionality for web content extraction.

scripts/extract.sh:65-153 SKILL.md:24-181

Environment Variable Access

Static scanner flagged 16 environment variable access instances for TAVILY_API_KEY. These are FALSE POSITIVES - the skill reads API keys from environment variables, which is the standard and secure method for providing credentials to API-based skills. The skill properly handles missing keys by initiating OAuth flow.

scripts/extract.sh:45-163 SKILL.md:13-20

Filesystem Access for OAuth Tokens

Static scanner flagged filesystem access to ~/.mcp-auth/ directory. This is a FALSE POSITIVE - the skill reads OAuth tokens from the standard MCP authentication directory. This is expected behavior for OAuth-based authentication and poses no security risk.

风险因素

⚙️ 外部命令 (62)

🌐 网络访问 (33)

📁 文件系统访问 (17)

scripts/extract.sh:45 scripts/extract.sh:17 scripts/extract.sh:26 scripts/extract.sh:32 scripts/extract.sh:50 scripts/extract.sh:60 scripts/extract.sh:98 scripts/extract.sh:98 scripts/extract.sh:115 scripts/extract.sh:116 scripts/extract.sh:128 scripts/extract.sh:134 scripts/extract.sh:163 SKILL.md:13 SKILL.md:20 SKILL.md:13 SKILL.md:20

🔑 环境变量 (16)

scripts/extract.sh:65 scripts/extract.sh:66 scripts/extract.sh:69 scripts/extract.sh:94 scripts/extract.sh:109 scripts/extract.sh:120 scripts/extract.sh:123 scripts/extract.sh:153 SKILL.md:24 SKILL.md:57 SKILL.md:69 SKILL.md:93 SKILL.md:137 SKILL.md:150 SKILL.md:167 SKILL.md:181

审计者: claude

质量评分

架构

100

可维护性

内容

社区

安全

规范符合性

你能构建什么

研究文档收集

从多个 API 参考页面提取文档内容，构建本地知识库

竞品分析

从竞争对手网站、产品页面和博客文章中提取内容，用于市场研究

内容聚合

从多个新闻源或博客拉取文章和内容，整合为单一的 markdown 格式

试试这些提示

基础 URL 提取

Extract the content from this URL: https://example.com/article

多 URL 提取

Extract content from these URLs: https://docs.example.com/api, https://docs.example.com/auth

基于查询的聚焦式提取

Extract information about authentication from these URLs: https://example.com/docs, https://example.com/api-reference. Focus on API keys and OAuth.

动态页面的高级提取

Extract all content from this JavaScript-heavy page using advanced extraction: https://app.example.com/dashboard

最佳实践

使用查询参数将内容筛选为正好您需要的，特别是在从大型页面提取时
先从基础提取开始，只有在内容缺失或不完整时才使用高级模式
按主题或类别对 URL 进行批量分组，以保持结果有条理且相关

避免

在单个请求中提取超过 20 个 URL 将会失败
使用 chunks_per_source 但不提供查询参数将返回错误
不检查响应中的 failed_results 字段可能会遗漏提取失败的情况

常见问题

我需要 Tavily API 密钥吗？

是的，您需要 Tavily API 密钥或现有的 Tavily 账户进行 OAuth 认证。可在 tavily.com 获取 API 密钥或注册账户。

一次可以提取多少个 URL？

每次请求最多可提取 20 个 URL。对于更大的批次，请拆分为多个请求。

基础提取和高级提取有什么区别？

基础提取速度更快，适用于静态 HTML 页面。高级提取可处理 JavaScript 渲染的页面、复杂布局和结构化数据，但耗时更长。

查询参数如何工作？

查询参数会根据您的搜索词对提取的内容块按相关性重新排序。将其与 chunks_per_source 配合使用可获取最相关的部分。

为什么我会收到 failed_results？

当 URL 不可访问、被阻止或超时时会出现失败结果。请检查响应中的 failed_results 数组以获取具体错误信息。

我可以从受密码保护的页面提取内容吗？

不可以，此技能无法从需要登录或认证的页面提取内容，仅限于公开可访问的内容。

开发者详情

作者

tavily-ai

许可证

MIT

仓库

https://github.com/tavily-ai/skills/tree/main/skills/tavily/extract/

引用

main

文件结构

📁 scripts/

📄 extract.sh

📄 SKILL.md