技能 web-scrape

🕸️

web-scrape

Name: web-scrape
Author: 21pounder

安全

從任何網頁中提取乾淨的內容

也可從以下取得: 21pounder

手動進行網頁抓取既耗時又容易出錯。此技能使用智慧內容提取技術，可在幾秒鐘內從任何 URL 提取乾淨、結構化的內容。它能處理動態頁面、去除廣告和導航等雜訊，並以 Markdown、JSON 或純文字格式輸出。

支援: Claude Codex Code(CC)

📊 70 充足

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「web-scrape」。 Scrape https://example.com/blog/post-title as markdown

預期結果:

# How to Build a REST API
**Source:** https://example.com/blog/post-title
**Date:** January 10, 2025
**Author:** Jane Developer
---
REST APIs are the backbone of modern web applications...
## Getting Started
First, install your preferred HTTP client...

安全審計

安全

v3 • 1/10/2026

This skill is a prompt-based wrapper that uses MCP Playwright tools for browser automation. The supporting Node.js script (html_clean.js) performs safe HTML-to-markdown conversion using standard libraries (cheerio, turndown) with stdin/stdout I/O only. No network calls, file writes, command execution, or sensitive data access. Security guidelines explicitly prohibit dangerous behaviors like executing page JavaScript or handling authentication.

已掃描檔案

306

分析行數

發現項

審計總數

未發現安全問題

審計者: claude 查看審計歷史 →

品質評分

架構

100

可維護性

內容

社群

100

安全

規範符合性

你能建構什麼

研究數據收集

從多個來源提取文章內容、文件和研究論文，整理成結構化的筆記

API 文件擷取

儲存 API 文件和技術內容，以便離線參考或整合工作使用

內容彙整

從多個網頁來源收集和策劃內容，以进行分析或獲取靈感

試試這些提示

基本頁面抓取

Scrape https://example.com/article and return the content as markdown

產品數據提取

Extract product information from https://shop.example.com/product as JSON with title, price, and description

多頁面文件

Scrape the documentation at https://docs.example.com/getting-started. Check if there are multiple pages and ask if you should continue

視覺捕獲

Navigate to https://example.com and take a full-page screenshot saved as example_page.png

最佳實務

從最簡單的抓取命令開始，只有在需要時才添加選項，如 --scroll 或 --screenshot
檢查提取內容的準確性，特別是對於包含動態元素的複雜頁面
抓取內容時尊重網站的使用條款和 robots.txt

避免

不要使用此技能在未經授權的情況下抓取登入保護或訂閱專屬的內容
不要嘗試繞過驗證碼或存取限制——這會導致失敗並浪費資源
不要在沒有適當速率限制的情況下抓取高頻率或即時數據

常見問題

此技能與哪些平台相容？

在配置 Playwright MCP 後，適用於 Claude、Codex 和 Claude Code。

速率限制是多少？

限制取決於您的 Playwright MCP 伺服器配置和目標網站的政策。

我可以與其他工具整合嗎？

可以，使用 JSON 輸出格式來獲取可與工作流程整合的結構化數據。

我的抓取活動會被追蹤嗎？

活動保持本地化——只有您的 Playwright 執行個體和目標伺服器能看到請求。

為什麼我的抓取失敗了？

常見原因包括超時、403/404 錯誤、驗證碼，或需要滾動選項的 JavaScript 密集型頁面。

這與 curl 或 wget 有什麼不同？

此技能會渲染 JavaScript、處理動態內容、自動提取乾淨的文字，並提供結構化輸出。

開發者詳情

作者

21pounder

授權

MIT

儲存庫

https://github.com/21pounder/terminalAgent/tree/main/deepresearch/.claude/skills/web-scrape

引用

main

檔案結構

📁 scripts/

📄 html_clean.js

📄 SKILL.md