Skills lamindb

🧬

lamindb

Name: lamindb
Author: K-Dense-AI

Safe ⚙️ External commands📁 Filesystem access🌐 Network access🔑 Env variables

Manage biological data with LaminDB

Also available from: davila7

Biological research generates complex datasets that are difficult to track, query, and reproduce. LaminDB provides a unified framework for managing biological data with automatic lineage tracking, ontology-based annotations, and seamless integration with workflow managers.

Supports: Claude Codex Code(CC)

📊 71 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "lamindb". How do I track my notebook analysis with LaminDB?

Expected outcome:

Use ln.track() at the start of your notebook to begin lineage capture
Import your data and perform analysis as normal
Call ln.finish() to complete tracking when done
View lineage with artifact.view_lineage() to see data provenance

Using "lamindb". Can you help me validate my experimental metadata?

Expected outcome:

Define a schema with required columns and data types
Create a DataFrameCurator or AnnDataCurator with your schema
Use curator.validate() to check data integrity
Use .cat.standardize() to fix typos and map synonyms

Using "lamindb". How do I connect LaminDB to my cloud storage?

Expected outcome:

Install extras: pip install 'lamindb[aws]' or 'lamindb[gcp]'
Configure storage: lamin init --storage s3://your-bucket
Set credentials via environment variables or config files
LaminDB handles caching and sync automatically

Security Audit

Safe

v4 • 1/17/2026

This is a pure documentation skill containing only markdown files with code examples for LaminDB biological data management. All 607 static findings are false positives. The analyzer incorrectly flagged markdown code formatting (backticks, code blocks), documentation about cloud storage configuration (AWS, GCP credentials), and library usage patterns (ln.Artifact) as security issues. No executable code, scripts, credential harvesting, or malicious patterns exist.

Files scanned

6,559

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (3)

references/annotation-validation.md:21-32 SKILL.md:49-61 references/setup-deployment.md:9-21

📁 Filesystem access (2)

references/core-concepts.md:12-24 SKILL.md:195-226

🌐 Network access (2)

references/integrations.md:46-72 SKILL.md:383-387

🔑 Env variables (2)

references/integrations.md:49-50 references/setup-deployment.md:205-206

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Annotate scRNA-seq data

Validate and standardize cell type annotations using controlled vocabularies from Cell Ontology

Build data lakehouses

Create unified query interfaces across multiple biological datasets with automatic versioning

Track model lineage

Link training data artifacts to MLflow or W&B experiments for full reproducibility

Try These Prompts

Get started

Help me set up LaminDB locally. I want to install it, authenticate, and initialize a local instance for managing my single-cell datasets.

Annotate data

I have scRNA-seq data with cell type labels. Show me how to validate and standardize these labels using the Cell Ontology via Bionty.

Track lineage

I run Nextflow pipelines for bulk RNA-seq analysis. Show me how to integrate LaminDB to track which code produced which output files.

Query data

I have hundreds of Parquet files organized by experiment and batch. Show me how to query all artifacts from project X with tissue=PBMC and condition=treated without loading all files.

Best Practices

Start every analysis notebook with ln.track() and end with ln.finish() for automatic lineage capture
Define schemas and validate data early to catch issues before extensive analysis
Use hierarchical artifact keys like 'project/experiment/batch/file.h5ad' for organization

Avoid

Creating new artifact keys for modified versions instead of using built-in versioning
Loading large datasets without filtering first - query metadata first to reduce I/O
Skipping ontology standardization which leads to inconsistent queries across similar terms

Frequently Asked Questions

What data formats does LaminDB support?

LaminDB supports DataFrames (Parquet, CSV), AnnData (single-cell), MuData (multi-modal), SpatialData, and TileDB-SOMA arrays.

Do I need a server to use LaminDB?

No. LaminDB works locally with SQLite for development. Scale to cloud storage with PostgreSQL for production teams.

How does LaminDB integrate with Nextflow?

Use ln.track() in process scripts to record inputs and outputs. LaminDB captures provenance automatically for each step.

What biological ontologies are available?

Genes (Ensembl), Proteins (UniProt), Cell types (CL), Tissues (Uberon), Diseases (Mondo), Phenotypes (HPO), and Pathways (GO).

Can I use LaminDB without internet?

Yes for local operations. Initial ontology downloads and cloud storage access require internet. Cache ontologies locally for offline use.

How is LaminDB different from a database?

LaminDB combines database features (querying, filtering) with versioned file storage and lineage tracking specialized for scientific data workflows.

Developer Details

Author

K-Dense-AI

License

Apache-2.0 license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/lamindb

Ref

main

File structure

📁 references/

📄 annotation-validation.md

📄 core-concepts.md

📄 data-management.md

📄 integrations.md

📄 ontologies.md

📄 setup-deployment.md

📄 evaluation_output.json

📄 SKILL.md