技能 extract

📦

extract

Name: extract
Author: tavily-ai

低風險 ⚙️ 外部命令🌐 網路存取📁 檔案系統存取🔑 環境變數

從網址擷取網頁內容

也可從以下取得: pbakaus

此技能使用 Tavily 的擷取 API 從特定網址擷取乾淨的 markdown 或文字內容。非常適合研究、文件擷取和內容聚合，無需編寫自訂爬蟲程式碼。

支援: Claude Codex Code(CC)

⚠️ 68 差

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「extract」。 Extract content from https://example.com/about

預期結果:

## About Example

Welcome to Example.com...

Our Mission

We strive to provide...

正在使用「extract」。 Extract information about pricing from https://example.com/pricing and https://example.com/plans

預期結果:

## Pricing Information

### Basic Plan - $9/month
- Feature A
- Feature B

### Pro Plan - $29/month
- All Basic features
- Priority support...

安全審計

低風險

v1 • 2/18/2026

Static analysis detected 137 potential issues across external_commands, network, filesystem, and env_access categories. After semantic evaluation, all findings are FALSE POSITIVES - these patterns represent legitimate API extraction functionality. The skill uses standard shell commands (curl, jq) to communicate with Tavily's official API, accesses environment variables for API key authentication, and reads OAuth tokens from the standard MCP auth directory. No malicious behavior, data exfiltration, or command injection vulnerabilities were identified.

已掃描檔案

369

分析行數

發現項

審計總數

低風險問題 (4)

scripts/extract.sh:1-167 SKILL.md:13-201

Shell Command Execution Patterns

Static scanner flagged 62 instances of shell command execution (backticks, $() substitutions). These are FALSE POSITIVES - the skill uses standard Unix tools (curl, jq, base64) for legitimate API communication with Tavily's official service. No user input is injected into shell commands without validation.

scripts/extract.sh:4-152 SKILL.md:16-189

Network Request Patterns

Static scanner flagged 33 network access instances including hardcoded URLs. These are FALSE POSITIVES - the skill is designed to make HTTPS API calls to Tavily's official endpoints (api.tavily.com, mcp.tavily.com). Network access is core functionality for web content extraction.

scripts/extract.sh:65-153 SKILL.md:24-181

Environment Variable Access

Static scanner flagged 16 environment variable access instances for TAVILY_API_KEY. These are FALSE POSITIVES - the skill reads API keys from environment variables, which is the standard and secure method for providing credentials to API-based skills. The skill properly handles missing keys by initiating OAuth flow.

scripts/extract.sh:45-163 SKILL.md:13-20

Filesystem Access for OAuth Tokens

Static scanner flagged filesystem access to ~/.mcp-auth/ directory. This is a FALSE POSITIVE - the skill reads OAuth tokens from the standard MCP authentication directory. This is expected behavior for OAuth-based authentication and poses no security risk.

風險因素

⚙️ 外部命令 (62)

🌐 網路存取 (33)

📁 檔案系統存取 (17)

scripts/extract.sh:45 scripts/extract.sh:17 scripts/extract.sh:26 scripts/extract.sh:32 scripts/extract.sh:50 scripts/extract.sh:60 scripts/extract.sh:98 scripts/extract.sh:98 scripts/extract.sh:115 scripts/extract.sh:116 scripts/extract.sh:128 scripts/extract.sh:134 scripts/extract.sh:163 SKILL.md:13 SKILL.md:20 SKILL.md:13 SKILL.md:20

🔑 環境變數 (16)

scripts/extract.sh:65 scripts/extract.sh:66 scripts/extract.sh:69 scripts/extract.sh:94 scripts/extract.sh:109 scripts/extract.sh:120 scripts/extract.sh:123 scripts/extract.sh:153 SKILL.md:24 SKILL.md:57 SKILL.md:69 SKILL.md:93 SKILL.md:137 SKILL.md:150 SKILL.md:167 SKILL.md:181

審計者: claude

品質評分

架構

100

可維護性

內容

社群

安全

規範符合性

你能建構什麼

研究文件彙集

從多個 API 參考頁面擷取文件內容，以建立本地知識庫

競爭分析

從競爭對手網站、產品頁面和部落格文章擷取內容，以進行市場研究

內容聚合

將多個新聞來源或部落格的文章和內容擷取為單一 markdown 格式

試試這些提示

基礎網址擷取

Extract the content from this URL: https://example.com/article

多網址擷取

Extract content from these URLs: https://docs.example.com/api, https://docs.example.com/auth

查詢導向擷取

Extract information about authentication from these URLs: https://example.com/docs, https://example.com/api-reference. Focus on API keys and OAuth.

動態頁面進階擷取

Extract all content from this JavaScript-heavy page using advanced extraction: https://app.example.com/dashboard

最佳實務

使用查詢參數過濾內容，特別是從大型頁面擷取時，只取得您需要的精確內容
先從基礎擷取開始，只有在內容缺失或不完整時才使用進階模式
按主題或類別對網址進行分組，以保持結果的組織性和相關性

避免

在單次請求中擷取超過 20 個網址將會失敗
使用 chunks_per_source 但沒有查詢參數將會返回錯誤
未檢查回應中的 failed_results 欄位可能會錯過擷取失敗的情況

常見問題

我需要 Tavily API 金鑰嗎？

是的，您需要 Tavily API 金鑰或現有的 Tavily 帳戶進行 OAuth 認證。您可以在 tavily.com 取得 API 金鑰或註冊帳戶。

我一次可以擷取多少個網址？

您每次請求最多可擷取 20 個網址。對於較大量的批次，請分成多個請求。

基礎和進階擷取有什麼區別？

基礎擷取速度較快，適用於靜態 HTML 頁面。進階擷取可處理 JavaScript 渲染的頁面、複雜版面配置和結構化資料，但需要較長時間。

查詢參數如何運作？

查詢參數會根據您的搜尋詞重新排序擷取的內容區塊的相關性。請配合 chunks_per_source 使用，以取得最相關的區塊。

為什麼我收到 failed_results？

當網址無法存取、被封鎖或逾時時會發生擷取失敗。請檢查回應中的 failed_results 陣列以取得具體的錯誤資訊。

我可以從密碼保護的頁面擷取內容嗎？

不行，此技能無法擷取需要登入或認證的頁面內容，只能擷取公開可存取的內容。

開發者詳情

作者

tavily-ai

授權

MIT

儲存庫

https://github.com/tavily-ai/skills/tree/main/skills/tavily/extract/

引用

main

檔案結構

📁 scripts/

📄 extract.sh

📄 SKILL.md