技能 vaex

📊

vaex

Name: vaex
Author: K-Dense-AI

安全 ⚙️ 外部命令📁 文件系统访问🌐 网络访问

使用 Vaex 分析海量数据集

也可从以下获取: davila7

处理超出 RAM 容量的大型表格数据集需要专门的工具。Vaex 支持核心外 DataFrame 操作、延迟求值，以及在超出内存容量的数据集上实现每秒十亿行的处理速度。非常适合天文数据、金融时间序列和大规模科学分析。

支持: Claude Codex Code(CC)

🥉 72 青铜

下载技能 ZIP

在 Claude 中上传

前往设置 → 功能 → 技能 → 上传技能

开启并开始使用

测试它

正在使用“vaex”。 Load my parquet file and show statistics

预期结果:

DataFrame shape: (10,000,000, 15) rows x columns
Column types: int64 (5), float64 (7), string (3)
Memory usage: 0.5 GB (virtual columns)
Mean age: 34.2 | Std income: 45200.5

正在使用“vaex”。 Filter and group data

预期结果:

Filtered to 2.3 million rows (age > 25)
Group by category results:
- Electronics: 450K rows, mean $52,000
- Clothing: 890K rows, mean $31,000
- Home: 960K rows, mean $42,000

正在使用“vaex”。 Convert CSV to HDF5 for performance

预期结果:

Original CSV: 15 GB, 45 minutes to load
Converted HDF5: 8 GB, instant loading
Memory-mapped access - zero RAM for exploration

安全审计

安全

v4 • 1/17/2026

This is a pure documentation skill for the Vaex Python library. All 498 static findings are false positives caused by markdown code block formatting. The scanner misinterpreted backticks in code examples as Ruby/shell commands, flagged memory-mapping as filesystem access, and misidentified DataFrame inspection methods as reconnaissance. No executable code, credential handling, or malicious patterns exist.

已扫描文件

6,268

分析行数

发现项

审计总数

风险因素

⚙️ 外部命令 (7)

SKILL.md:32-178 references/core_dataframes.md:15-156 references/data_processing.md:11-554 references/io_operations.md:19-702 references/machine_learning.md:7-727 references/performance.md:11-570 references/visualization.md:20-612

📁 文件系统访问 (3)

references/io_operations.md:10-13 references/io_operations.md:22-48 references/performance.md:259-262

🌐 网络访问 (2)

references/io_operations.md:474 skill-report.json:6

审计者: claude 查看审计历史 →

质量评分

架构

100

可维护性

内容

社区

100

安全

规范符合性

你能构建什么

探索十亿行数据集

无需内存限制或预处理，交互式分析大型 CSV/HDF5 数据集。

处理天文数据

使用核心外计算和延迟求值处理太字节规模的科学数据集。

构建可扩展管道

创建特征工程和 ML 工作流，处理超出可用 RAM 的数据集。

试试这些提示

加载大型数据集

Use Vaex to open my HDF5 file at data/large_dataset.hdf5 and show its structure, column types, and row count.

过滤和聚合

Filter the dataset for records where age > 25 and calculate the mean and standard deviation of income grouped by category.

创建可视化

Create a heatmap showing the relationship between x and y coordinates with 100 bins on each axis.

构建 ML 管道

Use Vaex ML to create a StandardScaler for features age and income, then apply PCA for dimensionality reduction.

最佳实践

使用 HDF5 或 Apache Arrow 格式实现即时内存映射加载，而非 CSV
利用虚拟列和表达式进行计算，无需实例化数据
使用 delay=True 执行批量操作以提高多个聚合的效率

避免

避免将整个数据集加载到 RAM 中 - 使用 vaex.open() 进行内存映射访问
不要将大型数据集转换为 pandas - 在整个管道中使用 Vaex 操作
避免多次小规模导出 - 批量写入并使用高效的 HDF5 等格式

常见问题

Vaex 与 pandas 有什么不同？

Vaex 使用延迟求值和内存映射来处理大于 RAM 的数据集，无需将所有内容加载到内存中。

Vaex 支持哪些文件格式？

Vaex 支持 HDF5、Apache Arrow、Parquet、CSV 和 FITS 格式，具有内存映射加载功能，可实现高效访问。

Vaex 能处理十亿行数据集吗？

是的，Vaex 可以使用优化的 C++ 操作和核心外计算每秒处理超过十亿行数据。

Vaex 支持机器学习吗？

Vaex ML 提供转换器、编码器、PCA、K-means，并与 scikit-learn、XGBoost 和 LightGBM 集成。

延迟求值是如何工作的？

操作在需要结果之前不会执行，从而实现高效批处理和最小内存使用。

Vaex 能访问云存储吗？

Vaex 可以使用 s3:// 和 gs:// 前缀等协议从 S3、GCS 和其他云存储读取数据。

开发者详情

作者

K-Dense-AI

许可证

MIT license

仓库

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/vaex

引用

main

文件结构

📁 references/

📄 core_dataframes.md

📄 data_processing.md

📄 io_operations.md

📄 machine_learning.md

📄 performance.md

📄 visualization.md

📄 SKILL.md