Skills geo-database
🧬

geo-database

Low Risk 🌐 Network access⚙️ External commands📁 Filesystem access

Access NCBI GEO gene expression data

Also available from: davila7

Researchers need efficient access to gene expression datasets for analysis. This skill enables querying, downloading, and analyzing data from NCBI's GEO database containing millions of genomics samples.

Supports: Claude Codex Code(CC)
⚠️ 68 Poor
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "geo-database". Search for diabetes gene expression datasets in humans

Expected outcome:

  • Found 1,247 datasets matching 'diabetes AND Homo sapiens'
  • Top results:
  • - GSE12345: Type 2 diabetes gene expression (47 samples)
  • - GSE67890: Diabetic nephropathy study (32 samples)
  • - GSE11111: Insulin response time course (24 samples)

Using "geo-database". Download GSE12345 and extract metadata

Expected outcome:

  • Downloaded GSE12345_series_matrix.txt.gz (145 MB)
  • Dataset summary:
  • - Title: Transcriptome profiling of diabetic vs normal kidney
  • - Samples: 20 (10 diabetic, 10 control)
  • - Platform: GPL570 (Affymetrix Human Genome U133 Plus 2.0)
  • - Organism: Homo sapiens
  • - Submission date: 2023-06-15

Security Audit

Low Risk
v4 • 1/17/2026

Documentation-only skill for accessing NCBI GEO database. Static analysis flagged 256 pattern-based issues but all are false positives. The 'backtick execution' findings are markdown code block syntax, not actual shell commands. Network operations are legitimate NCBI API access. FTP downloads target public GEO data repositories. Optional API key usage follows NCBI best practices. No executable code present - only documentation.

3
Files scanned
1,878
Lines analyzed
3
findings
4
Total audits
Audited by: claude View Audit History →

Quality Score

41
Architecture
100
Maintainability
87
Content
29
Community
90
Security
74
Spec Compliance

What You Can Build

Analyze gene expression in disease

Download and compare gene expression data between healthy and diseased tissue samples to identify biomarkers.

Meta-analysis across studies

Combine data from multiple GEO studies to increase statistical power for detecting gene expression changes.

Build predictive models

Use GEO expression data to train machine learning models for drug response prediction or patient stratification.

Try These Prompts

Search GEO datasets
Search GEO for human breast cancer gene expression datasets from the last 5 years. Show the top 5 results with sample counts and platforms used.
Download expression data
Download the expression matrix and metadata for GSE12345. Save the files to ./data/ and show a summary of the dataset including number of samples and genes.
Differential expression
Perform differential expression analysis on GSE12345 comparing treatment vs control samples. Use limma or t-test and show the top 10 most significant genes.
Batch processing
Download and process these 3 GEO series: GSE100001, GSE100002, GSE100003. Extract expression data and create a summary table with study metadata.

Best Practices

  • Always set your email when using NCBI E-utilities (required by NCBI policy)
  • Obtain a free API key from NCBI for increased rate limits (10 req/s vs 3 req/s)
  • Cache downloaded GEO files locally to avoid repeated downloads

Avoid

  • Do not download entire GEO database - be selective with accessions
  • Do not hardcode API keys in shared or version-controlled code
  • Do not ignore sample metadata when interpreting expression data

Frequently Asked Questions

Do I need an API key for GEO access?
API key is optional but recommended. Without key: 3 requests per second. With key: 10 requests per second. Get free key at ncbi.nlm.nih.gov/account/
What is the difference between GSE, GSM, and GPL?
GSE is the complete study (series), GSM is an individual sample, GPL is the microarray or sequencing platform. Use GSE for full datasets.
Why is expression data missing for some series?
Older submissions may lack matrix files. Download family SOFT file or parse individual sample tables for complete data.
How do I handle very large GEO datasets?
Use FTP downloads for bulk data, process in chunks, and use sparse matrices for memory efficiency with sparse expression data.
Can I use GEO data for clinical research?
Yes, GEO data is public domain. Always cite original studies and verify data quality before clinical applications.
What file format should I use for expression data?
Series matrix files are fastest for expression values. Use SOFT for complete metadata. Use MINiML for XML-based processing needs.