Skills anndata
πŸ”¬

anndata

Safe βš™οΈ External commands🌐 Network access

Work with AnnData matrices

Also available from: davila7

AnnData provides a standardized data structure for annotated matrices used in single-cell genomics. This skill enables creating, reading, writing, and manipulating .h5ad files with full support for metadata, embeddings, and the scverse ecosystem.

Supports: Claude Codex Code(CC)
πŸ₯‰ 73 Bronze
1

Download the skill ZIP

2

Upload in Claude

Go to Settings β†’ Capabilities β†’ Skills β†’ Upload skill

3

Toggle on and start using

Test it

Using "anndata". How do I read a 10X Genomics H5 file and convert it to AnnData?

Expected outcome:

  • Use ad.read_10x_h5() to read the H5 format directly
  • The function handles gene and barcode extraction automatically
  • Optional genome parameter for selecting specific reference when multiple are present

Using "anndata". What is backed mode and when should I use it?

Expected outcome:

  • Backed mode keeps data on disk and loads only accessed portions
  • Use it for datasets larger than available RAM to avoid out-of-memory errors
  • Access metadata and create subsets without loading entire file into memory

Security Audit

Safe
v4 β€’ 1/17/2026

All 397 static findings are FALSE POSITIVES. This skill contains only markdown documentation with Python code examples. The static scanner incorrectly flags backticks in fenced code blocks, URLs in documentation links, and generic programming terms. No executable code, network operations, or credential handling exists. This is a legitimate scientific computing documentation skill for the AnnData Python library.

7
Files scanned
4,567
Lines analyzed
2
findings
4
Total audits

Risk Factors

Audited by: claude View Audit History β†’

Quality Score

45
Architecture
100
Maintainability
87
Content
30
Community
100
Security
91
Spec Compliance

What You Can Build

Single-cell RNA-seq analysis

Load and process 10X Genomics data for single-cell transcriptomics research with proper metadata tracking.

Multi-batch data integration

Combine multiple experimental batches with automatic batch label tracking and conflict resolution.

Deep learning integration

Export data to PyTorch DataLoaders for training neural networks on single-cell expression data.

Try These Prompts

Create AnnData object
Create an AnnData object from a numpy array with observation metadata for cell types and sample IDs.
Read H5AD file
Read an H5AD file in backed mode and filter for high-quality cells based on a quality_score column.
Concatenate batches
Concatenate three AnnData objects along the observation axis with batch labels and inner join.
Optimize memory
Show how to convert string columns to categorical and use sparse matrices for memory efficiency.

Best Practices

  • Use backed mode (backed='r') for datasets larger than available RAM to avoid out-of-memory errors.
  • Convert string columns to categorical with strings_to_categoricals() for 10-50x memory reduction.
  • Store raw data with adata.raw = adata.copy() before filtering to preserve access to unfiltered genes.

Avoid

  • Avoid modifying views directly without copying first, as changes may affect the original object.
  • Do not load entire large datasets into memory when backed mode can provide lazy access.
  • Avoid index misalignment when adding external metadata by using set_index() and loc[].join().

Frequently Asked Questions

What is the difference between backed mode and in-memory mode?
Backed mode keeps data on disk and loads only accessed portions, enabling work with datasets larger than RAM.
How do I combine multiple AnnData objects for different modalities like RNA and protein?
Use Muon (MuData) to combine multiple AnnData objects for different modalities like RNA and protein.
When should I use sparse matrices?
Use sparse matrices when data has more than 50% zeros, common in single-cell count data.
How do I track which batch each cell came from?
Use the label and keys parameters in ad.concat() to add a batch column automatically.
What is the raw attribute for?
raw stores a snapshot of data before filtering, allowing access to original unfiltered genes later.
How do I handle out of memory errors?
Use backed mode, convert to sparse matrices, convert strings to categoricals, or process in chunks.

Developer Details

File structure

πŸ“ references/

πŸ“„ best_practices.md

πŸ“„ concatenation.md

πŸ“„ data_structure.md

πŸ“„ io_operations.md

πŸ“„ manipulation.md

πŸ“„ SKILL.md