Skills molfeat

🧪

molfeat

Name: molfeat
Author: K-Dense-AI

Safe ⚙️ External commands📁 Filesystem access🌐 Network access

Convert molecules to ML features

Also available from: davila7

Molecular machine learning requires converting chemical structures into numerical representations. Molfeat provides 100+ featurizers to transform SMILES strings into machine learning-ready features for QSAR modeling and drug discovery.

Supports: Claude Codex Code(CC)

📊 70 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "molfeat". Convert aspirin (CC(=O)OC1=CC=CC=C1C(=O)O) to ECFP fingerprint

Expected outcome:

Generated ECFP fingerprint with radius 3 and 2048 bits
Non-zero bits: 45 features activated
Bit density: 2.2% (sparse representation)
Shape: (2048,) numpy array
Ready for machine learning models

Using "molfeat". Compare ECFP, MACCS, and RDKit descriptors for caffeine

Expected outcome:

ECFP4: 2048-bit vector with 52 non-zero features
MACCS: 167-bit structural keys with 28 true bits
RDKit2D: 200+ descriptor values including LogP=0.43, TPSA=61.1
Combined features: 2415-dimensional vector

Security Audit

Safe

v4 • 1/17/2026

The molfeat skill is a legitimate cheminformatics library for molecular feature extraction. All 397 static findings are false positives triggered by scientific terminology in documentation. The scanner misinterpreted markdown code fences as shell commands, chemistry terminology (ecfp, maccs, gin, c2) as security threats, and documentation URLs as network indicators.

Files scanned

2,234

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (1)

SKILL.md:28-497

📁 Filesystem access (1)

SKILL.md:399-400

🌐 Network access (1)

SKILL.md:505-508

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Build QSAR models for drug properties

Use molecular fingerprints and descriptors to train machine learning models predicting ADME properties, toxicity, or bioactivity

Virtual screening of compound libraries

Convert millions of molecules to features for similarity searching and activity prediction against biological targets

Chemical space analysis and clustering

Generate molecular embeddings to visualize and cluster chemical libraries for diversity analysis

Try These Prompts

Basic fingerprint generation

Use molfeat to convert these SMILES to ECFP fingerprints: CCO, CC(=O)O, c1ccccc1. Show the code and output shape.

Batch processing descriptors

Load a dataset of 100 molecules and extract RDKit 2D descriptors using molfeat with parallel processing.

Pretrained model embeddings

Use ChemBERTa to generate embeddings for drug-like molecules and visualize them with PCA.

QSAR pipeline optimization

Compare ECFP, MACCS, and ChemBERTa features for predicting molecular properties using random forest regression.

Best Practices

Use n_jobs=-1 for parallel processing on multi-core systems
Cache pretrained model embeddings to avoid recomputation
Handle invalid molecules with ignore_errors=True for large datasets

Avoid

Processing one molecule at a time in loops instead of batch processing
Using deep learning models for simple similarity searches where fingerprints suffice
Ignoring error handling when processing large compound libraries

Frequently Asked Questions

What is the difference between calculators and transformers?

Calculators process single molecules while transformers handle batches with parallelization and scikit-learn compatibility.

Which featurizer should I use for QSAR modeling?

Start with ECFP fingerprints (radius 2-3, 1024-2048 bits) as they capture molecular connectivity patterns relevant to bioactivity.

How do I handle invalid SMILES strings?

Set ignore_errors=True in MoleculeTransformer to skip invalid molecules and continue processing.

Can I combine multiple featurizers?

Yes, use FeatConcat to combine different feature types like fingerprints and descriptors into a single vector.

Why are pretrained models slower than fingerprints?

Deep learning models require neural network inference while fingerprints use predefined algorithms, but offer better transfer learning capabilities.

How do I save and reuse featurizer configurations?

Use transformer.to_state_yaml_file() to save and MoleculeTransformer.from_state_yaml_file() to reload configurations.

Developer Details

Author

K-Dense-AI

License

Apache-2.0 license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/molfeat

Ref

main

File structure

📁 references/

📄 api_reference.md

📄 available_featurizers.md

📄 examples.md

📄 SKILL.md