Skills geniml
🧬

geniml

Safe ⚙️ External commands

Analyze genomic intervals with machine learning

Also available from: davila7

Geniml transforms BED files into machine learning embeddings for genomic region analysis. Train models to find patterns in chromatin accessibility, build consensus peak sets, and analyze single-cell ATAC-seq data.

Supports: Claude Codex Code(CC)
📊 71 Adequate
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "geniml". Train region2vec on my ATAC-seq peaks and evaluate the embeddings

Expected outcome:

  • Tokenized 15,234 peaks using universe file
  • Trained 100-dimensional embeddings for 8,567 unique regions
  • Silhouette score: 0.72 (good clustering quality)
  • Davies-Bouldin index: 0.85 (low inter-cluster similarity)
  • Generated 2D UMAP for visualization

Using "geniml". Build a consensus peak universe from 10 ATAC-seq experiments

Expected outcome:

  • Combined 245,000 peaks from all experiments
  • Applied coverage cutoff method with 5x threshold
  • Generated consensus universe with 32,450 regions
  • Coverage of input peaks: 87.3%
  • Mean region size: 425bp (appropriate for ATAC-seq)

Using "geniml". Analyze single-cell ATAC-seq data for cell type annotation

Expected outcome:

  • Pre-tokenized 8,500 cells from PBMC dataset
  • Trained scEmbed model with 100 dimensions
  • Generated cell embeddings for all cells
  • Leiden clustering identified 12 distinct cell populations
  • Annotated major types: T cells, B cells, monocytes, NK cells

Security Audit

Safe
v4 • 1/17/2026

Static analysis flagged 194 patterns, but ALL are false positives. The 'external_commands' findings are markdown bash code blocks in documentation (not actual shell execution). 'Weak cryptographic' refers to MD5 checksums for file verification (legitimate bioinformatics practice). 'Ransomware keywords' is a false positive triggered by security audit text itself. 'Hidden file access' refers to standard cache directories. All patterns represent legitimate genomic ML workflows.

8
Files scanned
2,570
Lines analyzed
1
findings
4
Total audits
Audited by: claude View Audit History →

Quality Score

45
Architecture
100
Maintainability
87
Content
21
Community
100
Security
91
Spec Compliance

What You Can Build

Compare ChIP-seq experiments

Train region embeddings to find similar peaks across different transcription factor binding experiments

Cluster cells by chromatin

Use scEmbed to analyze scATAC-seq data and identify cell types based on chromatin accessibility patterns

Build reference peak sets

Create consensus universes from multiple ATAC-seq experiments for standardized analyses

Try These Prompts

Train region embeddings
Help me train region2vec embeddings on my BED files. First tokenize them using a universe file, then train a 100-dimensional embedding model.
Analyze scATAC-seq
Use scEmbed to analyze my scATAC-seq data in scanpy. Tokenize the cells, train an embedding model, and generate UMAP visualization.
Build consensus peaks
Build a consensus universe from my collection of BED files using the coverage cutoff method with 5x threshold.
Joint region-label embeddings
Train BEDspace embeddings on regions with cell type labels to enable cross-modal queries between regions and metadata.

Best Practices

  • Always build high-quality universes with good peak coverage before training embeddings
  • Validate tokenization coverage (greater than 80 percent) and adjust p-value thresholds if needed
  • Use multiple evaluation metrics to assess embedding quality and biological relevance

Avoid

  • Training on low-quality or misaligned peak sets without proper universe building
  • Using default parameters without tuning for your specific data type and scale
  • Skipping evaluation steps - always validate embeddings before downstream analysis

Frequently Asked Questions

What file formats does geniml support?
Geniml works with standard BED files (3+ columns) for genomic regions and CSV files for metadata.
How do I choose embedding dimension?
Start with 100 dimensions for most analyses. Use 50 for small datasets, 200+ for complex multi-label scenarios.
Can I use geniml with other single-cell tools?
Yes, scEmbed outputs integrate seamlessly with scanpy as adata.obsm embeddings for clustering and visualization.
What is the difference between Region2Vec and BEDspace?
Region2Vec trains on regions only. BEDspace jointly embeds regions and metadata labels for cross-modal queries.
How long does training take?
Minutes for small datasets (thousands of regions), hours for large collections. Use GPU for scEmbed on big single-cell data.
Do I need a universe file?
Yes, for tokenization. Build one with consensus peaks or use a reference like ENCODE SCREEN.

Developer Details