Skills molfeat
๐Ÿงช

molfeat

Safe โš™๏ธ External commands๐Ÿ“ Filesystem access๐ŸŒ Network access

Convert molecules to ML features

Also available from: davila7

Molecular machine learning requires converting chemical structures into numerical representations. Molfeat provides 100+ featurizers to transform SMILES strings into machine learning-ready features for QSAR modeling and drug discovery.

Supports: Claude Codex Code(CC)
๐Ÿ“Š 70 Adequate
1

Download the skill ZIP

2

Upload in Claude

Go to Settings โ†’ Capabilities โ†’ Skills โ†’ Upload skill

3

Toggle on and start using

Test it

Using "molfeat". Convert aspirin (CC(=O)OC1=CC=CC=C1C(=O)O) to ECFP fingerprint

Expected outcome:

  • Generated ECFP fingerprint with radius 3 and 2048 bits
  • Non-zero bits: 45 features activated
  • Bit density: 2.2% (sparse representation)
  • Shape: (2048,) numpy array
  • Ready for machine learning models

Using "molfeat". Compare ECFP, MACCS, and RDKit descriptors for caffeine

Expected outcome:

  • ECFP4: 2048-bit vector with 52 non-zero features
  • MACCS: 167-bit structural keys with 28 true bits
  • RDKit2D: 200+ descriptor values including LogP=0.43, TPSA=61.1
  • Combined features: 2415-dimensional vector

Security Audit

Safe
v4 โ€ข 1/17/2026

The molfeat skill is a legitimate cheminformatics library for molecular feature extraction. All 397 static findings are false positives triggered by scientific terminology in documentation. The scanner misinterpreted markdown code fences as shell commands, chemistry terminology (ecfp, maccs, gin, c2) as security threats, and documentation URLs as network indicators.

5
Files scanned
2,234
Lines analyzed
3
findings
4
Total audits

Risk Factors

โš™๏ธ External commands (1)
๐Ÿ“ Filesystem access (1)
๐ŸŒ Network access (1)
Audited by: claude View Audit History โ†’

Quality Score

45
Architecture
100
Maintainability
87
Content
21
Community
100
Security
78
Spec Compliance

What You Can Build

Build QSAR models for drug properties

Use molecular fingerprints and descriptors to train machine learning models predicting ADME properties, toxicity, or bioactivity

Virtual screening of compound libraries

Convert millions of molecules to features for similarity searching and activity prediction against biological targets

Chemical space analysis and clustering

Generate molecular embeddings to visualize and cluster chemical libraries for diversity analysis

Try These Prompts

Basic fingerprint generation
Use molfeat to convert these SMILES to ECFP fingerprints: CCO, CC(=O)O, c1ccccc1. Show the code and output shape.
Batch processing descriptors
Load a dataset of 100 molecules and extract RDKit 2D descriptors using molfeat with parallel processing.
Pretrained model embeddings
Use ChemBERTa to generate embeddings for drug-like molecules and visualize them with PCA.
QSAR pipeline optimization
Compare ECFP, MACCS, and ChemBERTa features for predicting molecular properties using random forest regression.

Best Practices

  • Use n_jobs=-1 for parallel processing on multi-core systems
  • Cache pretrained model embeddings to avoid recomputation
  • Handle invalid molecules with ignore_errors=True for large datasets

Avoid

  • Processing one molecule at a time in loops instead of batch processing
  • Using deep learning models for simple similarity searches where fingerprints suffice
  • Ignoring error handling when processing large compound libraries

Frequently Asked Questions

What is the difference between calculators and transformers?
Calculators process single molecules while transformers handle batches with parallelization and scikit-learn compatibility.
Which featurizer should I use for QSAR modeling?
Start with ECFP fingerprints (radius 2-3, 1024-2048 bits) as they capture molecular connectivity patterns relevant to bioactivity.
How do I handle invalid SMILES strings?
Set ignore_errors=True in MoleculeTransformer to skip invalid molecules and continue processing.
Can I combine multiple featurizers?
Yes, use FeatConcat to combine different feature types like fingerprints and descriptors into a single vector.
Why are pretrained models slower than fingerprints?
Deep learning models require neural network inference while fingerprints use predefined algorithms, but offer better transfer learning capabilities.
How do I save and reuse featurizer configurations?
Use transformer.to_state_yaml_file() to save and MoleculeTransformer.from_state_yaml_file() to reload configurations.

Developer Details

File structure

๐Ÿ“ references/

๐Ÿ“„ api_reference.md

๐Ÿ“„ available_featurizers.md

๐Ÿ“„ examples.md

๐Ÿ“„ SKILL.md