技能 crawl4ai

📦

crawl4ai

Name: crawl4ai
Author: CK991357

安全 🌐 網路存取⚙️ 外部命令

爬取網頁並支援截圖與 PDF 匯出

也可從以下取得: smallnest

網頁爬取既困難又耗時。Crawl4AI 提供 6 種智能模式,可從任何網站提取內容、截圖和 PDF,並具備反偵測功能。

支援: Claude Codex Code(CC)

⚠️ 67 差

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「crawl4ai」。 Scrape https://example.com/article and return the main content

預期結果:

The page was successfully scraped. Here is the content:

# Article Title

This is the main content of the article...

Source: https://example.com/article
Words: 1250

正在使用「crawl4ai」。 Take a screenshot of https://example.com and save it as PDF

預期結果:

The page screenshot and PDF have been generated. The screenshot shows the full homepage layout with navigation, hero section, and footer content. The PDF document is 5 pages.

安全審計

安全

v6 • 1/21/2026

All static findings are false positives. The scanner misinterpreted markdown documentation patterns (code fences, example URLs) as security issues. This is a legitimate web scraping tool with no malicious code or intent.

已掃描檔案

2,919

分析行數

發現項

審計總數

風險因素

審計者: claude 查看審計歷史 →

品質評分

架構

可維護性

內容

社群

100

安全

規範符合性

你能建構什麼

研究與資料收集

自動爬取文件網站、部落格和新聞文章以建立研究資料集。使用關鍵字過濾以專注於相關內容。

內容封存與證據收集

擷取網頁截圖和 PDF 以用於法律、合規或封存目的。生成網頁內容變化的視覺記錄。

競爭情報收集

系統化地從競爭對手網站提取產品資訊、定價和規格。建立市場情報的結構化資料庫。

試試這些提示

基本頁面爬取

Use crawl4ai to scrape the following URL and return the content in markdown format: {url}

擷取視覺證據

Use crawl4ai to scrape {url} and include both a full-page screenshot and PDF export in your response.

批次研究收集

Use crawl4ai batch_crawl mode to process these URLs: {urls}. Set concurrent_limit to 4 and return all content in markdown format.

結構化資料提取

Use crawl4ai extract mode to pull structured data from {url}. Use this schema: {schema_definition}. Extract using CSS selectors.

最佳實務

在嘗試複雜的深度爬取之前,先從簡單的 scrape 模式開始
在批次處理之前,先在單一頁面上測試提取架構
遵守網站服務條款,並在請求之間實施適當的延遲

避免

呼叫 crawl4ai 時不要省略 parameters 包裝器
批次操作時不要將 URL 以字串形式傳遞,應使用陣列
在沒有部署 LLM 實例的情況下,不要嘗試基於 LLM 的提取

常見問題

crawl4ai 支援哪些模式?

Crawl4ai 支援 6 種模式:scrape(單一頁面)、deep_crawl(整個網站)、batch_crawl(多個 URL)、extract(結構化資料)、pdf_export 和 screenshot。

智能分級系統如何運作?

版本 1.2 會自動偵測網站類型並套用最佳設定:靜態網站使用 standard,JavaScript 網站使用 enhanced,複雜網站使用 fallback。

我可以爬取受密碼保護的頁面嗎?

不可以,crawl4ai 不支援身份驗證。只能爬取公開存取的網頁。

最大爬取深度是多少?

Deep crawl 支援可設定的 max_depth(預設 3)和 max_pages(預設 80)。Batch crawl 限制為總共 20 頁。

截圖和 PDF 如何回傳?

二進位輸出在 JSON 回應中以 base64 編碼,方便沒有檔案系統存取權限的 AI 模型處理。

crawl4ai 會繞過付費牆或存取控制嗎?

不會,crawl4ai 遵守 robots.txt、速率限制和標準存取控制。反偵測功能僅防止自動化偵測。

開發者詳情

作者

CK991357

授權

MIT

儲存庫

https://github.com/CK991357/gemini-chat/tree/main/src/skills/crawl4ai

引用

main

檔案結構

📄 SKILL.md