Skills datamol

🧪

datamol

Name: datamol
Author: K-Dense-AI

Safe ⚡ Contains scripts📁 Filesystem access

Analyze molecular structures with Datamol

Also available from: davila7

Process chemical data efficiently with a Pythonic interface to RDKit. Datamol simplifies complex cheminformatics operations while maintaining full compatibility with the RDKit ecosystem.

Supports: Claude Codex Code(CC)

📊 70 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "datamol". Standardize these SMILES: OCCO, C(CO)O, ethanol

Expected outcome:

OCCO → CCO (canonical SMILES for ethanol)
C(CO)O → CCO (same molecule, different representation)
ethanol → None (invalid SMILES, returns None)
All valid ethanol representations standardize to the same canonical form

Using "datamol". Compute descriptors for caffeine

Expected outcome:

Molecular weight: 194.19 g/mol
LogP: 0.61
H-bond donors: 0
H-bond acceptors: 6
TPSA: 58.44 Å²
Number of aromatic atoms: 5

Using "datamol". Find similar molecules to aspirin

Expected outcome:

Generated ECFP4 fingerprints for query and library
Calculated Tanimoto similarity matrix
Top 5 most similar molecules identified
Similarity scores range from 0.72 to 0.85
Visualized aligned structures with activity labels

Security Audit

Safe

v4 • 1/17/2026

All 593 static findings are false positives. This is a documentation-only skill containing markdown files with Python code examples. The analyzer misinterpreted markdown code formatting (backticks) as shell commands, chemistry terminology as cryptographic patterns, and RDKit method calls as system reconnaissance. No actual security vulnerabilities exist.

Files scanned

3,724

Lines analyzed

findings

Total audits

Risk Factors

⚡ Contains scripts (1)

SKILL.md:1-704

📁 Filesystem access (1)

references/core_api.md:1

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Analyze compound libraries

Process and standardize molecular datasets, compute drug-likeness properties, and identify promising candidates.

Molecular similarity analysis

Generate fingerprints, calculate similarity matrices, and cluster compounds for virtual screening campaigns.

Feature engineering for ML

Extract molecular descriptors and fingerprints as features for predictive modeling in drug discovery.

Try These Prompts

Basic molecule processing

Use datamol to convert these SMILES strings to standardized molecules: CCO, c1ccccc1, CC(=O)O. Show the canonical SMILES for each.

Compute molecular properties

Calculate molecular weight, logP, H-bond donors and acceptors for these molecules: aspirin (CC(=O)OC1=CC=CC=C1C(=O)O) and caffeine (CN1C=NC2=C1C(=O)N(C(=O)N2C)C).

Cluster molecular datasets

Generate ECFP fingerprints for these molecules and cluster them: benzene, toluene, phenol, benzoic acid, aniline. Use Tanimoto similarity with 0.3 cutoff.

3D conformer analysis

Generate 50 conformers for cyclohexane, cluster them by RMSD, and identify the most representative conformers. Calculate SASA for each.

Best Practices

Always standardize molecules from external sources before analysis
Use parallel processing (n_jobs=-1) for large datasets to improve performance
Check for None values after molecule parsing to handle invalid inputs gracefully

Avoid

Don't skip standardization when working with external molecular data
Avoid full Butina clustering for datasets larger than 1000 molecules
Don't use default fingerprints without considering your specific similarity needs

Frequently Asked Questions

What is Datamol?

Datamol is a Python library that provides a simplified interface to RDKit for molecular cheminformatics operations.

Do I need to install RDKit separately?

Yes, Datamol is a wrapper around RDKit, so you need both installed: 'uv pip install datamol rdkit'.

Can Datamol handle large molecular datasets?

Yes, it supports parallel processing for most operations and can handle thousands of molecules efficiently.

What file formats does Datamol support?

SDF, SMILES, CSV, Excel, MOL, Mol2, PDB, and remote files via fsspec (S3, GCS, HTTP).

How do I visualize molecules?

Use dm.viz.to_image() for basic visualization or dm.viz.conformers() for 3D conformer visualization.

Is Datamol suitable for machine learning?

Yes, it provides molecular descriptors and fingerprints that can be used as features for ML models.

Developer Details

Author

K-Dense-AI

License

Apache-2.0 license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/datamol

Ref

main

File structure

📁 references/

📄 conformers_module.md

📄 core_api.md

📄 descriptors_viz.md

📄 fragments_scaffolds.md

📄 io_module.md

📄 reactions_data.md

📄 SKILL.md