pydeseq2
Analyze RNA-seq differential gene expression with PyDESeq2
Also available from: davila7
PyDESeq2 enables differential gene expression analysis from bulk RNA-seq count data. Perform statistical testing, multiple comparison correction, and generate publication-ready volcano and MA plots for your genomics research.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "pydeseq2". Analyze my RNA-seq data and show top differentially expressed genes
Expected outcome:
- Analysis complete. Found 847 significant genes (padj < 0.05)
- Top upregulated genes:
- - GeneX: log2FC = 4.2, padj = 1.3e-15
- - GeneY: log2FC = 3.8, padj = 2.7e-12
- - GeneZ: log2FC = 3.5, padj = 5.1e-11
- Top downregulated genes:
- - GeneA: log2FC = -3.9, padj = 8.2e-14
- - GeneB: log2FC = -3.1, padj = 3.4e-10
- Results saved to deseq2_results.csv
Security Audit
SafeAll 429 static findings are false positives. The 'weak cryptographic algorithm' flags incorrectly match 'DES' in 'DESeq2' (a statistical method name, not cryptography). The 'external_commands' flags misinterpret markdown code fences as shell execution. Filesystem access is standard data I/O for bioinformatics workflows. Network access involves only documentation URLs. This is a legitimate scientific computing skill with no malicious code.
Risk Factors
📁 Filesystem access (2)
🌐 Network access (1)
Quality Score
What You Can Build
Compare treated vs control
Identify differentially expressed genes between experimental conditions using proper statistical testing and FDR correction for publication-ready results.
RNA-seq thesis analysis
Process RNA-seq count data, perform differential expression analysis, and generate publication-quality figures for thesis or research papers.
Batch RNA-seq processing
Automate differential expression analysis across multiple conditions or timepoints using the included command-line script.
Try These Prompts
Load my RNA-seq data from counts.csv and metadata.csv, then perform differential expression analysis comparing treated vs control samples using PyDESeq2
Analyze my RNA-seq data accounting for batch effects using design formula ~batch + condition, then test for treatment vs control differences
Run PyDESeq2 analysis on my data and create volcano and MA plots highlighting significant genes with padj < 0.05
Load RNA-seq data, filter genes with fewer than 20 total counts, use multi-factor design ~age + sex + condition, and identify genes with |log2FC| > 1 and padj < 0.01
Best Practices
- Always transpose count matrix if genes are rows (use .T to get samples × genes format)
- Filter low-count genes before analysis to improve statistical power
- Use adjusted p-values (padj) not raw p-values for determining significance
- Check that sample names match exactly between counts and metadata files
Avoid
- Never use raw p-values for multiple testing - always use FDR-corrected padj values
- Do not apply LFC shrinkage before statistical testing - use after for visualization only
- Avoid complex multi-factor designs without sufficient sample size per condition
- Never transpose metadata - only transpose count matrix if needed
Frequently Asked Questions
Why do I get an index mismatch error?
Should I transpose my count matrix?
What is the difference between pvalue and padj?
When should I use LFC shrinkage?
How do I handle batch effects in my analysis?
Why are no genes significant in my analysis?
Developer Details
Author
K-Dense-AILicense
MIT license
Repository
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/pydeseq2Ref
main
File structure