手动进行网页抓取既耗时又容易出错。此技能使用智能内容提取功能,可在几秒钟内从任意URL提取干净的结构化内容。它能处理动态页面,自动去除广告、导航等干扰内容,并以markdown、JSON或纯文本格式输出。
Baixar o ZIP da skill
Upload no Claude
Vá em Configurações → Capacidades → Skills → Upload skill
Ative e comece a usar
Testar
A utilizar "web-scrape". Scrape https://example.com/blog/post-title as markdown
Resultado esperado:
- # 如何构建REST API
- **来源:** https://example.com/blog/post-title
- **日期:** 2025年1月10日
- **作者:** Jane Developer
- ---
- REST API是现代Web应用程序的支柱……
- ## 入门指南
- 首先,安装您喜欢的HTTP客户端……
Auditoria de Segurança
SeguroThis skill is a prompt-based wrapper that uses MCP Playwright tools for browser automation. The supporting Node.js script (html_clean.js) performs safe HTML-to-markdown conversion using standard libraries (cheerio, turndown) with stdin/stdout I/O only. No network calls, file writes, command execution, or sensitive data access. Security guidelines explicitly prohibit dangerous behaviors like executing page JavaScript or handling authentication.
Pontuação de qualidade
O Que Você Pode Construir
研究数据收集
从多个来源提取文章内容、文档和研究论文,整理成结构化笔记
API文档捕获
保存API文档和技术内容以供离线参考或集成工作使用
内容聚合
从多个网络来源收集和策划内容进行分析或获取灵感
Tente Estes Prompts
Scrape https://example.com/article and return the content as markdown
Extract product information from https://shop.example.com/product as JSON with title, price, and description
Scrape the documentation at https://docs.example.com/getting-started. Check if there are multiple pages and ask if you should continue
Navigate to https://example.com and take a full-page screenshot saved as example_page.png
Melhores Práticas
- 从最简单的抓取命令开始,仅在需要时添加--scroll或--screenshot等选项
- 检查提取内容的准确性,尤其是包含动态元素的复杂页面
- 抓取内容时请遵守网站的服务条款和robots.txt
Evitar
- 不要使用此技能在未经授权的情况下抓取受登录保护或仅限订阅的内容
- 不要尝试绕过CAPTCHA或访问限制——这会失败并浪费资源
- 不要在没有适当速率限制的情况下抓取高频或实时数据
Perguntas Frequentes
此技能与哪些平台兼容?
速率限制是多少?
我可以与其他工具集成吗?
我的抓取活动会被跟踪吗?
为什么我的抓取失败了?
这与curl或wget有何不同?
Detalhes do Desenvolvedor
Autor
21pounderLicença
MIT
Repositório
https://github.com/21pounder/terminalAgent/tree/main/deepresearch/.claude/skills/web-scrapeReferência
main
Estrutura de arquivos