📊

vaex

Name: vaex
Author: K-Dense-AI

Safe ⚙️ External commands📁 Filesystem access🌐 Network access

Analyze massive datasets with Vaex

Also available from: davila7

Processing large tabular datasets that exceed RAM requires specialized tools. Vaex enables out-of-core DataFrame operations, lazy evaluation, and billion-row-per-second processing on datasets too big for memory. Perfect for astronomical data, financial time series, and large-scale scientific analysis.

Supports: Claude Codex Code(CC)

🥉 72 Bronze

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "vaex". Load my parquet file and show statistics

Expected outcome:

DataFrame shape: (10,000,000, 15) rows x columns
Column types: int64 (5), float64 (7), string (3)
Memory usage: 0.5 GB (virtual columns)
Mean age: 34.2 | Std income: 45200.5

Using "vaex". Filter and group data

Expected outcome:

Filtered to 2.3 million rows (age > 25)
Group by category results:
- Electronics: 450K rows, mean $52,000
- Clothing: 890K rows, mean $31,000
- Home: 960K rows, mean $42,000

Using "vaex". Convert CSV to HDF5 for performance

Expected outcome:

Original CSV: 15 GB, 45 minutes to load
Converted HDF5: 8 GB, instant loading
Memory-mapped access - zero RAM for exploration

Security Audit

Safe

v4 • 1/17/2026

This is a pure documentation skill for the Vaex Python library. All 498 static findings are false positives caused by markdown code block formatting. The scanner misinterpreted backticks in code examples as Ruby/shell commands, flagged memory-mapping as filesystem access, and misidentified DataFrame inspection methods as reconnaissance. No executable code, credential handling, or malicious patterns exist.

Files scanned

6,268

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (7)

SKILL.md:32-178 references/core_dataframes.md:15-156 references/data_processing.md:11-554 references/io_operations.md:19-702 references/machine_learning.md:7-727 references/performance.md:11-570 references/visualization.md:20-612

📁 Filesystem access (3)

references/io_operations.md:10-13 references/io_operations.md:22-48 references/performance.md:259-262

🌐 Network access (2)

references/io_operations.md:474 skill-report.json:6

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Explore billion-row datasets

Analyze massive CSV/HDF5 datasets interactively without memory constraints or preprocessing.

Process astronomical data

Work with terabyte-scale scientific datasets using out-of-core computation and lazy evaluation.

Build scalable pipelines

Create feature engineering and ML workflows that handle datasets exceeding available RAM.

Try These Prompts

Load large dataset

Use Vaex to open my HDF5 file at data/large_dataset.hdf5 and show its structure, column types, and row count.

Filter and aggregate

Filter the dataset for records where age > 25 and calculate the mean and standard deviation of income grouped by category.

Create visualization

Create a heatmap showing the relationship between x and y coordinates with 100 bins on each axis.

Build ML pipeline

Use Vaex ML to create a StandardScaler for features age and income, then apply PCA for dimensionality reduction.

Best Practices

Use HDF5 or Apache Arrow formats for instant memory-mapped loading instead of CSV
Leverage virtual columns and expressions for computations without materializing data
Batch operations with delay=True when performing multiple aggregations for efficiency

Avoid

Avoid loading entire datasets into RAM - use vaex.open() for memory-mapped access
Do not convert large datasets to pandas - use Vaex operations throughout the pipeline
Avoid multiple small exports - batch writes and use efficient formats like HDF5

Frequently Asked Questions

What makes Vaex different from pandas?

Vaex uses lazy evaluation and memory-mapping to process datasets larger than RAM without loading everything into memory.

What file formats does Vaex support?

Vaex supports HDF5, Apache Arrow, Parquet, CSV, and FITS formats with memory-mapped loading for efficient access.

Can Vaex handle billion-row datasets?

Yes, Vaex can process over a billion rows per second using optimized C++ operations and out-of-core computation.

Does Vaex support machine learning?

Vaex ML provides transformers, encoders, PCA, K-means, and integration with scikit-learn, XGBoost, and LightGBM.

How does lazy evaluation work?

Operations are not executed until results are needed, enabling efficient batching and minimal memory usage.

Can Vaex access cloud storage?

Vaex can read from S3, GCS, and other cloud storage using protocols like s3:// and gs:// prefixes.

Developer Details

Author

K-Dense-AI

License

MIT license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/vaex

Ref

main

File structure

📁 references/

📄 core_dataframes.md

📄 data_processing.md

📄 io_operations.md

📄 machine_learning.md

📄 performance.md

📄 visualization.md

📄 SKILL.md