Skills pysam
🧬

pysam

Safe ⚙️ External commands📁 Filesystem access

Work with genomic sequencing files

Also available from: davila7

Process and analyze DNA sequencing data with tools for reading BAM, VCF, and FASTQ files. Extract genomic regions, calculate coverage statistics, and integrate multiple file types for comprehensive variant analysis.

Supports: Claude Codex Code(CC)
📊 71 Adequate
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "pysam". Open a BAM file and show coverage statistics for chromosome 1

Expected outcome:

  • Chromosome 1 statistics:
  • Total reads: 1,245,678
  • Mapped reads: 1,198,432 (96.2%)
  • Mean coverage: 32.4x
  • Regions below 10x coverage: 5,234 positions

Using "pysam". Filter variants by quality and depth

Expected outcome:

  • Filtered 12,456 variants to 3,892 high-quality variants
  • Applied filters: QUAL > 30, DP > 10, MQ > 40
  • Variants written to filtered.vcf

Using "pysam". Extract sequences around variant positions

Expected outcome:

  • Extracted 100bp sequences for 847 variants
  • Sequences written to variant_contexts.fasta
  • Flanking region: +/- 50bp from each variant position

Security Audit

Safe
v4 • 1/17/2026

All 447 static findings are FALSE POSITIVES caused by bioinformatics terminology being misinterpreted as security-relevant patterns. The scanner flags 'SAM' as Windows Security Account Manager when it means Sequence Alignment/Map format, and samtools/bcftools as network scanning tools when they are legitimate bioinformatics command-line utilities. The skill contains only documentation and code examples for legitimate genomic data processing. No actual malicious code, command injection, credential access, or network exfiltration patterns exist.

7
Files scanned
2,265
Lines analyzed
2
findings
4
Total audits
Audited by: claude View Audit History →

Quality Score

45
Architecture
90
Maintainability
85
Content
30
Community
100
Security
91
Spec Compliance

What You Can Build

Variant analysis workflow

Extract and filter genetic variants from VCF files, annotate with read coverage from BAM files

Coverage analysis

Calculate per-base coverage, identify low-coverage regions, generate coverage tracks for visualization

Quality control pipeline

Validate sequencing data, check reference consistency, filter reads by quality thresholds

Try These Prompts

Read alignment data
Use pysam to open example.bam and print all reads overlapping chr1 positions 1000-2000
Process variants
Open variants.vcf and print all variants on chr2 with quality score above 30
Calculate coverage
Calculate per-base coverage for chromosome 1 positions 100000-200000 using pileup analysis
Extract sequences
Open reference.fasta and extract the sequence for gene ABC on chr5 from position 10000 to 11000

Best Practices

  • Always use indexed BAM files for random access operations to improve performance
  • Remember pysam uses 0-based coordinates while VCF files use 1-based coordinates
  • Use pileup() for column-wise coverage analysis instead of repeated fetch() calls

Avoid

  • Loading entire BAM files into memory instead of using iterator-based processing
  • Ignoring coordinate system differences between pysam and VCF file formats
  • Processing large files without creating index files for random access

Frequently Asked Questions

What is the difference between SAM and BAM files?
SAM is a human-readable text format for alignment data. BAM is the compressed binary version that enables efficient random access and smaller file sizes.
Do I need to install samtools separately?
No, pysam includes bindings to samtools and bcftools commands. The underlying htslib library is included with pysam.
How do I create an index for my BAM file?
Use pysam.index('your_file.bam') to create the .bai index file. This enables fast region-based queries.
Can pysam filter reads by mapping quality?
Yes, use the quality parameter in fetch() or filter reads manually using the mapping_quality attribute of AlignedSegment objects.
What coordinate system does pysam use?
Pysam uses 0-based, half-open coordinates for programmatic access. However, region strings in fetch() use 1-based coordinates to match samtools convention.
How do I extract variants overlapping a specific gene?
Use pysam.TabixFile to open a BED file with gene coordinates, then use vcf.fetch() with those coordinates to get overlapping variants.