datamol
Analyze molecular structures with Datamol
Also available from: davila7
Process chemical data efficiently with a Pythonic interface to RDKit. Datamol simplifies complex cheminformatics operations while maintaining full compatibility with the RDKit ecosystem.
Download the skill ZIP
Upload in Claude
Go to Settings โ Capabilities โ Skills โ Upload skill
Toggle on and start using
Test it
Using "datamol". Standardize these SMILES: OCCO, C(CO)O, ethanol
Expected outcome:
- OCCO โ CCO (canonical SMILES for ethanol)
- C(CO)O โ CCO (same molecule, different representation)
- ethanol โ None (invalid SMILES, returns None)
- All valid ethanol representations standardize to the same canonical form
Using "datamol". Compute descriptors for caffeine
Expected outcome:
- Molecular weight: 194.19 g/mol
- LogP: 0.61
- H-bond donors: 0
- H-bond acceptors: 6
- TPSA: 58.44 ร ยฒ
- Number of aromatic atoms: 5
Using "datamol". Find similar molecules to aspirin
Expected outcome:
- Generated ECFP4 fingerprints for query and library
- Calculated Tanimoto similarity matrix
- Top 5 most similar molecules identified
- Similarity scores range from 0.72 to 0.85
- Visualized aligned structures with activity labels
Security Audit
SafeAll 593 static findings are false positives. This is a documentation-only skill containing markdown files with Python code examples. The analyzer misinterpreted markdown code formatting (backticks) as shell commands, chemistry terminology as cryptographic patterns, and RDKit method calls as system reconnaissance. No actual security vulnerabilities exist.
Risk Factors
โก Contains scripts (1)
๐ Filesystem access (1)
Quality Score
What You Can Build
Analyze compound libraries
Process and standardize molecular datasets, compute drug-likeness properties, and identify promising candidates.
Molecular similarity analysis
Generate fingerprints, calculate similarity matrices, and cluster compounds for virtual screening campaigns.
Feature engineering for ML
Extract molecular descriptors and fingerprints as features for predictive modeling in drug discovery.
Try These Prompts
Use datamol to convert these SMILES strings to standardized molecules: CCO, c1ccccc1, CC(=O)O. Show the canonical SMILES for each.
Calculate molecular weight, logP, H-bond donors and acceptors for these molecules: aspirin (CC(=O)OC1=CC=CC=C1C(=O)O) and caffeine (CN1C=NC2=C1C(=O)N(C(=O)N2C)C).
Generate ECFP fingerprints for these molecules and cluster them: benzene, toluene, phenol, benzoic acid, aniline. Use Tanimoto similarity with 0.3 cutoff.
Generate 50 conformers for cyclohexane, cluster them by RMSD, and identify the most representative conformers. Calculate SASA for each.
Best Practices
- Always standardize molecules from external sources before analysis
- Use parallel processing (n_jobs=-1) for large datasets to improve performance
- Check for None values after molecule parsing to handle invalid inputs gracefully
Avoid
- Don't skip standardization when working with external molecular data
- Avoid full Butina clustering for datasets larger than 1000 molecules
- Don't use default fingerprints without considering your specific similarity needs
Frequently Asked Questions
What is Datamol?
Do I need to install RDKit separately?
Can Datamol handle large molecular datasets?
What file formats does Datamol support?
How do I visualize molecules?
Is Datamol suitable for machine learning?
Developer Details
Author
K-Dense-AILicense
Apache-2.0 license
Repository
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/datamolRef
main
File structure
๐ references/
๐ conformers_module.md
๐ core_api.md
๐ descriptors_viz.md
๐ io_module.md
๐ reactions_data.md
๐ SKILL.md