技能 vaex

📊

vaex

Name: vaex
Author: davila7

安全 ⚙️ 外部命令🌐 網路存取📁 檔案系統存取🔑 環境變數

高效處理數十億行資料集

也可從以下取得: K-Dense-AI

處理超過 RAM 容量的大型資料集會導致記憶體錯誤和效能緩慢。Vaex 使用延遲評估和記憶體映射技術,無需將資料載入記憶體即可即時處理數十億行資料。

支援: Claude Codex Code(CC)

📊 71 充足

下載技能 ZIP

在 Claude 中上傳

前往設定 → 功能 → 技能 → 上傳技能

開啟並開始使用

測試它

正在使用「vaex」。載入我的 10GB 銷售資料檔案並顯示各地區的收入分佈

預期結果:

資料集形狀:150,000,000 行 × 25 欄
記憶體使用量:0 位元組(記憶體映射 HDF5)
各地區收入:
• 北部:$12.5B(平均:$245)
• 南部:$8.3B(平均:$198)

正在使用「vaex」。從身高和體重欄位建立 BMI 虛擬欄位

預期結果:

已建立虛擬欄位:df['bmi']
記憶體開銷:0 位元組
公式:df.weight_kg / (df.height_m ^ 2)
已準備好進行聚合和篩選。

正在使用「vaex」。顯示總購買金額前 10 名的客戶

預期結果:

客戶分析:
• 最高消費客戶:總計 $1.2M
• 前 10 名客戶:合計 $8.5M
• 處理時間:0.3 秒(延遲評估)

安全審計

安全

v5 • 1/17/2026

This is a pure documentation skill containing only reference guides and Python code examples for the Vaex library. All 501 static findings are false positives triggered by documentation patterns. The analyzer misinterprets markdown code examples, placeholder credential documentation, and legitimate feature descriptions as security issues. No executable code, network operations, or credential exposure exists.

已掃描檔案

3,938

分析行數

發現項

審計總數

風險因素

⚙️ 外部命令 (444)

🌐 網路存取 (2)

references/io_operations.md:474 skill-report.json:6

📁 檔案系統存取 (16)

references/io_operations.md:10 references/io_operations.md:13 references/io_operations.md:22 references/io_operations.md:31 references/io_operations.md:39 references/io_operations.md:48 references/io_operations.md:422 references/io_operations.md:427 references/io_operations.md:433 references/io_operations.md:434 references/io_operations.md:692 references/io_operations.md:637 references/io_operations.md:221 references/performance.md:259 references/performance.md:262 skill-report.json:125

🔑 環境變數 (1)

references/io_operations.md:349

審計者: claude 查看審計歷史 →

品質評分

架構

100

可維護性

內容

社群

100

安全

規範符合性

你能建構什麼

分析海量資料集

探索和分析數十億行資料集,無需擔心記憶體錯誤或抽樣問題。

在大數據上訓練模型

在傳統工具無法處理的大型資料集上建立和部署機器學習管線。

處理時間序列資料

處理大型金融時間序列資料以進行風險分析和預測。

試試這些提示

載入大型資料集

使用 Vaex 載入大型 HDF5/Parquet 檔案,並顯示基本統計資訊和欄位資訊。

篩選和聚合

根據條件篩選資料集,並高效執行分組聚合運算。

建立視覺化

為大型資料集建立熱圖或直方圖視覺化。

建立機器學習管線

使用 Vaex ML 轉換器預處理特徵並訓練 XGBoost 模型。

最佳實務

將 CSV 檔案轉換為 HDF5 或 Arrow 格式以實現即時載入
使用虛擬欄位而非實體化資料以節省記憶體
使用 delay=True 批次處理多個操作以實現單次運算
使用選擇功能而非建立新的 DataFrame 來進行篩選

避免

在大型資料集上使用 .to_pandas_df() 會失去 Vaex 的優勢
在不需要時使用 .values 將資料轉換為 NumPy 陣列
重複匯出為 CSV 而非使用 HDF5/Arrow
無充分理由地實體化虛擬欄位

常見問題

Vaex 如何處理大於 RAM 的資料集?

Vaex 使用記憶體映射檔案,將資料保留在磁碟上,僅將存取的部分讀入記憶體。

哪些檔案格式最適合 Vaex?

HDF5 和 Apache Arrow 提供即時載入。CSV 對大型檔案來說速度較慢。

我可以在 pandas 程式碼中使用 Vaex 嗎?

Vaex 的 API 與 pandas 相似,但某些操作有所不同。不保證完全相容 pandas。

使用 Vaex 時我的資料安全嗎?

Vaex 絕不會修改來源檔案。所有轉換都會建立虛擬欄位或新的匯出檔案。

為什麼我的操作執行緩慢?

請確認您使用的是 HDF5/Arrow 格式而非 CSV。對多個聚合操作使用 delay=True。

Vaex 與 Dask 或 Polars 相比如何?

Vaex 擅長以最少記憶體處理數十億行資料集。Dask 處理分散式運算,Polars 在記憶體內資料處理上更快。

開發者詳情

作者

davila7

授權

MIT

儲存庫

https://github.com/davila7/claude-code-templates/tree/main/cli-tool/components/skills/scientific/vaex

引用

main

檔案結構

📁 references/

📄 core_dataframes.md

📄 data_processing.md

📄 io_operations.md

📄 machine_learning.md

📄 performance.md

📄 visualization.md

📄 SKILL.md