umap-learn
Apply UMAP dimensionality reduction for data visualization
Also available from: davila7
High-dimensional data is difficult to visualize and analyze. UMAP reduces dimensions while preserving structure, enabling clear 2D/3D visualizations and better clustering results.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "umap-learn". Apply UMAP to visualize my iris dataset in 2D
Expected outcome:
- Created UMAP embedding with shape (150, 2)
- Applied StandardScaler preprocessing
- Generated scatter plot showing three distinct clusters
- Preserved 92% of local neighborhood structure
- Ready for interactive exploration of species relationships
Using "umap-learn". Use UMAP to preprocess my customer data for clustering
Expected outcome:
- Applied clustering-optimized UMAP with n_neighbors=30, min_dist=0.0
- Reduced to 10 dimensions for HDBSCAN
- Identified 5 customer segments with HDBSCAN
- Found 23 noise points (unassigned customers)
- Density preserved better than direct 2D reduction
Using "umap-learn". Apply supervised UMAP with my labeled dataset
Expected outcome:
- Used 5000 labeled samples with 50 features
- Supervised embedding achieved 0.89 cluster separation
- Classes are clearly visible in 2D visualization
- Preserved internal structure within each class
Security Audit
SafeAll static findings are false positives. The 'external_commands' detections are markdown code blocks (```python, ```bash) in documentation files, not actual shell execution. No malicious code, network requests, or security risks exist. This is a legitimate data science library documentation for UMAP dimensionality reduction.
Risk Factors
Quality Score
What You Can Build
Visualize high-dimensional datasets
Create 2D scatter plots of complex data like gene expression, text embeddings, or customer behavior for pattern discovery.
Preprocess data for clustering
Reduce dimensions before applying HDBSCAN to overcome curse of dimensionality and improve cluster quality.
Feature engineering for ML pipelines
Create compact 10-50 dimensional embeddings that preserve structure for downstream classification or regression tasks.
Try These Prompts
Apply UMAP to reduce my dataset to 2D for visualization. Use standard parameters and create a scatter plot colored by the target variable.
Configure UMAP for clustering preprocessing with n_neighbors=30, min_dist=0.0, n_components=10, then apply HDBSCAN to find clusters.
Create a supervised UMAP embedding using my class labels to separate categories while preserving internal structure within each class.
Apply UMAP with cosine distance for my document embeddings, or use hamming distance for binary feature data.
Best Practices
- Always standardize features before applying UMAP to ensure equal weighting across dimensions
- Set random_state parameter for reproducible results across runs
- Use n_neighbors=30, min_dist=0.0, n_components=10 for clustering preprocessing workflows
Avoid
- Applying UMAP to raw unscaled data will produce biased embeddings with unequal feature weighting
- Using default parameters for all tasks without tuning for specific goals reduces effectiveness
- Assuming UMAP preserves density perfectly - it can create artificial cluster divisions
Frequently Asked Questions
When should I use UMAP vs t-SNE?
Why are my clusters disconnected?
How do I make results reproducible?
Can UMAP handle categorical variables?
What is the difference between fit() and fit_transform()?
How do I choose the right number of components?
Developer Details
Author
K-Dense-AILicense
BSD-3-Clause license
Repository
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/umap-learnRef
main
File structure