技能 web-search-scraper-api-skill

📦

web-search-scraper-api-skill

Name: web-search-scraper-api-skill
Author: browser-act

低風險 🌐 網路存取🔑 環境變數⚙️ 外部命令

從任何網站 URL 提取 Markdown

網頁爬取經常因為 CAPTCHA、速率限制或複雜的 JavaScript 渲染而失敗。此技能使用 BrowserAct 的 API，能夠可靠地從任何 URL 提取乾淨且完整的 Markdown 內容，而不會遇到這些障礙。

支援: Claude Codex Code(CC)

📊 69 充足

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「web-search-scraper-api-skill」。 Extract markdown from https://example.com/blog/post

預期結果:

Successfully extracted 2,450 words of markdown content including all headings, code blocks, and formatted text from the article.

正在使用「web-search-scraper-api-skill」。 Scrape this tutorial page: https://docs.example.com/getting-started

預期結果:

Converted 15 sections of documentation into clean markdown with preserved headings, lists, and code examples.

安全審計

低風險

v2 • 5/21/2026

All 37 static analysis findings were evaluated and determined to be false positives in context. The Python script is a standard API client that makes HTTP requests to a single legitimate endpoint (api.browseract.com). The env_access findings correspond to reading a BROWSERACT_API_KEY from environment variables, which is a standard and necessary pattern for authenticating with the API service. The external_commands findings are markdown code block examples in SKILL.md, not executable code. The weak cryptographic algorithm and system reconnaissance findings had no supporting evidence in the actual file contents. The skill performs its documented purpose with no malicious intent or data exfiltration patterns detected.

已掃描檔案

173

分析行數

發現項

審計總數

低風險問題 (5)

SKILL.md:35-38

External Commands in Markdown Code Blocks

SKILL.md contains bash command examples inside markdown code fences. These are documentation examples, not executed commands. The static analyzer misinterpreted markdown backtick syntax as shell backtick execution. No actual command injection risk exists.

scripts/web_search_scraper_api.py:86

Environment Variable Access for API Authentication

The script reads BROWSERACT_API_KEY from environment variables. This is a standard and secure pattern for API credential management. The API key is used only to authenticate with the documented BrowserAct API service at api.browseract.com.

scripts/web_search_scraper_api.py:29 scripts/web_search_scraper_api.py:46 scripts/web_search_scraper_api.py:69

Network Requests to Legitimate API Endpoint

All HTTP requests target api.browseract.com, a documented third-party web scraping API. The requests are for starting tasks, polling status, and retrieving results. No data exfiltration or communication with unknown endpoints.

SKILL.md:3

Weak Cryptographic Algorithm (False Detection)

Static analyzer flagged SKILL.md lines 3, 10, 29, and 47 for weak cryptographic algorithms. These lines contain markdown headings, descriptions, and parameter documentation. No cryptographic operations exist in this skill.

SKILL.md:55-56

System Reconnaissance (False Detection)

SKILL.md lines 55-56 describe error handling logic for checking if API responses contain 'Invalid authorization'. This is standard error message parsing, not system reconnaissance.

審計者: claude 查看審計歷史 →

品質評分

架構

100

可維護性

內容

社群

安全

規範符合性

你能建構什麼

研究資料收集

自動從多個 URL 提取文章內容、文件和參考資料，用於研究目的，無需手動複製貼上。

內容彙整流程

將提取的 markdown 內容輸入 AI 系統進行摘要、分析或重新格式化。適合建立內容處理流程。

文件歸檔

以 markdown 格式下載並歸檔技術文件、教學和 API 參考資料，以便離線存取或備份。

試試這些提示

提取單篇文章

Extract the markdown content from this URL: ${url}

批次 URL 提取

Use the web scraper skill to extract markdown from each of these URLs: ${urls}. Process them one by one and return the content.

文件爬取

Extract all content from the documentation at ${url} as markdown so I can read it offline.

帶有備援的內容提取

Try to extract the article content from ${url}. If the API key is missing, ask me for it first before attempting the extraction.

最佳實務

在呼叫爬蟲之前，務必確認目標 URL 可存取，以避免不必要的 API 呼叫
透過在繼續之前提示使用者，優雅地處理缺少 API 金鑰的情況
為暫時性失敗實作重試邏輯（單次重試），但在授權錯誤時停止

避免

請勿在未驗證的情況下傳遞不受信任的 URL — 此技能需要格式正確的 HTTP/HTTPS URL
請勿忽略 API 金鑰錯誤 — 務必向使用者回報驗證失敗
請勿在短時間內重複爬取相同的 URL — 請遵守速率限制

常見問題

使用此技能需要什麼？

您需要將 BrowserAct API 金鑰設定為 BROWSERACT_API_KEY 環境變數。請前往 browseract.com 註冊以取得您的金鑰。

提取需要多長時間？

大多數頁面的提取時間為 10-60 秒，視複雜度而定。指令碼會輪詢完成狀態，每 10 秒回報一次狀態。

這可以繞過 CAPTCHA 嗎？

可以。BrowserAct 透過瀏覽器自動化自動處理 CAPTCHA 和機器人偵測。

支援哪些格式？

任何 HTTP 或 HTTPS URL 皆可使用。輸出始終是結構、標題和程式碼區塊都保留的乾淨 Markdown。

是否有重試機制？

有的。如果請求失敗且非授權錯誤，代理程式會自動重試一次。無效的 API 金鑰不會重試。

是否有速率限制？

BrowserAct 根據您的訂閱方案設有速率限制。此技能的設計旨在遵守合理的使用模式。

開發者詳情

作者

browser-act

授權

MIT

儲存庫

https://github.com/browser-act/skills/tree/main/solutions/search-research/web-search-scraper-api-skill

引用

main

檔案結構

📁 scripts/

📄 web_search_scraper_api.py

📄 SKILL.md