zarr-python
Store large N-dimensional arrays efficiently
Also available from: davila7
Working with large datasets that exceed memory limits. Zarr-python enables chunked array storage with compression for efficient cloud-native scientific computing workflows.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "zarr-python". Create a Zarr array for storing temperature data with 365 time steps, 720 latitudes, and 1440 longitudes.
Expected outcome:
- Created Zarr array at 'temperature.zarr'
- Shape: (365, 720, 1440) | Chunks: (1, 720, 1440) | Dtype: float32
- Compression: Blosc (zstd, level 5) with shuffle filter
- Each chunk contains one complete daily snapshot (~4MB)
- Use z.append() to add new time steps efficiently
Security Audit
SafeAll 227 static findings are FALSE POSITIVES. The analyzer misidentified markdown documentation content as security vulnerabilities. Backticks in markdown are code formatting, not shell execution. Compression codec names (zstd, gzip, lz4) were flagged as cryptographic algorithms but are data compression. URLs are legitimate documentation links. No executable code, shell commands, or cryptographic operations exist in these documentation files.
Risk Factors
⚙️ External commands (2)
🌐 Network access (1)
Quality Score
What You Can Build
Store climate model data
Store terabyte-scale climate data with time dimensions. Enable efficient appending of new timesteps.
Manage model checkpoints
Store large embedding matrices and model weights. Integrate with Dask for distributed training.
Process genomic datasets
Handle multi-terabyte genomic arrays. Use cloud storage for collaboration.
Try These Prompts
Create a Zarr array with shape (10000, 10000), chunks of (1000, 1000), and float32 dtype. Store it at data/my_array.zarr.
Set up a Zarr array stored in S3 with s3fs. Use bucket my-bucket and path data/arrays.zarr.
Load a Zarr array as a Dask array and compute the mean along axis 0 in parallel.
Create a Zarr array optimized for cloud storage: 10MB chunks, consolidated metadata, and sharding enabled.
Best Practices
- Choose chunk sizes of 1-10 MB for optimal I/O performance
- Align chunk shape with your data access pattern (e.g., time-first for time series)
- Consolidate metadata when using cloud storage to reduce latency
Avoid
- Avoid loading entire large arrays into memory - process in chunks
- Do not use small chunks (<1MB) as they create excessive metadata overhead
- Avoid frequent writes to the same cloud storage location without synchronization
Frequently Asked Questions
What is the difference between Zarr v2 and v3 formats?
How do I choose the right chunk size?
Can Zarr handle arrays larger than available memory?
What compression should I use?
How does Zarr compare to HDF5?
Can I use Zarr with existing HDF5 files?
Developer Details
Author
K-Dense-AILicense
MIT license
Repository
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/zarr-pythonRef
main
File structure