🧬

geniml

Name: geniml
Author: K-Dense-AI

Safe ⚙️ External commands

Analyze genomic intervals with machine learning

Also available from: davila7

Geniml transforms BED files into machine learning embeddings for genomic region analysis. Train models to find patterns in chromatin accessibility, build consensus peak sets, and analyze single-cell ATAC-seq data.

Supports: Claude Codex Code(CC)

📊 71 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "geniml". Train region2vec on my ATAC-seq peaks and evaluate the embeddings

Expected outcome:

Tokenized 15,234 peaks using universe file
Trained 100-dimensional embeddings for 8,567 unique regions
Silhouette score: 0.72 (good clustering quality)
Davies-Bouldin index: 0.85 (low inter-cluster similarity)
Generated 2D UMAP for visualization

Using "geniml". Build a consensus peak universe from 10 ATAC-seq experiments

Expected outcome:

Combined 245,000 peaks from all experiments
Applied coverage cutoff method with 5x threshold
Generated consensus universe with 32,450 regions
Coverage of input peaks: 87.3%
Mean region size: 425bp (appropriate for ATAC-seq)

Using "geniml". Analyze single-cell ATAC-seq data for cell type annotation

Expected outcome:

Pre-tokenized 8,500 cells from PBMC dataset
Trained scEmbed model with 100 dimensions
Generated cell embeddings for all cells
Leiden clustering identified 12 distinct cell populations
Annotated major types: T cells, B cells, monocytes, NK cells

Security Audit

Safe

v4 • 1/17/2026

Static analysis flagged 194 patterns, but ALL are false positives. The 'external_commands' findings are markdown bash code blocks in documentation (not actual shell execution). 'Weak cryptographic' refers to MD5 checksums for file verification (legitimate bioinformatics practice). 'Ransomware keywords' is a false positive triggered by security audit text itself. 'Hidden file access' refers to standard cache directories. All patterns represent legitimate genomic ML workflows.

Files scanned

2,570

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (6)

references/bedspace.md:23-30 references/consensus_peaks.md:21-23 references/utilities.md:19-30 references/scembed.md:23-38 references/region2vec.md:25-33 SKILL.md:19-33

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Compare ChIP-seq experiments

Train region embeddings to find similar peaks across different transcription factor binding experiments

Cluster cells by chromatin

Use scEmbed to analyze scATAC-seq data and identify cell types based on chromatin accessibility patterns

Build reference peak sets

Create consensus universes from multiple ATAC-seq experiments for standardized analyses

Try These Prompts

Train region embeddings

Help me train region2vec embeddings on my BED files. First tokenize them using a universe file, then train a 100-dimensional embedding model.

Analyze scATAC-seq

Use scEmbed to analyze my scATAC-seq data in scanpy. Tokenize the cells, train an embedding model, and generate UMAP visualization.

Build consensus peaks

Build a consensus universe from my collection of BED files using the coverage cutoff method with 5x threshold.

Joint region-label embeddings

Train BEDspace embeddings on regions with cell type labels to enable cross-modal queries between regions and metadata.

Best Practices

Always build high-quality universes with good peak coverage before training embeddings
Validate tokenization coverage (greater than 80 percent) and adjust p-value thresholds if needed
Use multiple evaluation metrics to assess embedding quality and biological relevance

Avoid

Training on low-quality or misaligned peak sets without proper universe building
Using default parameters without tuning for your specific data type and scale
Skipping evaluation steps - always validate embeddings before downstream analysis

Frequently Asked Questions

What file formats does geniml support?

Geniml works with standard BED files (3+ columns) for genomic regions and CSV files for metadata.

How do I choose embedding dimension?

Start with 100 dimensions for most analyses. Use 50 for small datasets, 200+ for complex multi-label scenarios.

Can I use geniml with other single-cell tools?

Yes, scEmbed outputs integrate seamlessly with scanpy as adata.obsm embeddings for clustering and visualization.

What is the difference between Region2Vec and BEDspace?

Region2Vec trains on regions only. BEDspace jointly embeds regions and metadata labels for cross-modal queries.

How long does training take?

Minutes for small datasets (thousands of regions), hours for large collections. Use GPU for scEmbed on big single-cell data.

Do I need a universe file?

Yes, for tokenization. Build one with consensus peaks or use a reference like ENCODE SCREEN.

Developer Details

Author

K-Dense-AI

License

BSD-2-Clause license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/geniml

Ref

main

File structure

📁 references/

📄 bedspace.md

📄 consensus_peaks.md

📄 region2vec.md

📄 scembed.md

📄 utilities.md

📄 evaluation.json

📄 SKILL.md