Skills umap-learn
📊

umap-learn

Safe ⚙️ External commands

Apply UMAP dimensionality reduction for data visualization

Also available from: davila7

High-dimensional data is difficult to visualize and analyze. UMAP reduces dimensions while preserving structure, enabling clear 2D/3D visualizations and better clustering results.

Supports: Claude Codex Code(CC)
📊 69 Adequate
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "umap-learn". Apply UMAP to visualize my iris dataset in 2D

Expected outcome:

  • Created UMAP embedding with shape (150, 2)
  • Applied StandardScaler preprocessing
  • Generated scatter plot showing three distinct clusters
  • Preserved 92% of local neighborhood structure
  • Ready for interactive exploration of species relationships

Using "umap-learn". Use UMAP to preprocess my customer data for clustering

Expected outcome:

  • Applied clustering-optimized UMAP with n_neighbors=30, min_dist=0.0
  • Reduced to 10 dimensions for HDBSCAN
  • Identified 5 customer segments with HDBSCAN
  • Found 23 noise points (unassigned customers)
  • Density preserved better than direct 2D reduction

Using "umap-learn". Apply supervised UMAP with my labeled dataset

Expected outcome:

  • Used 5000 labeled samples with 50 features
  • Supervised embedding achieved 0.89 cluster separation
  • Classes are clearly visible in 2D visualization
  • Preserved internal structure within each class

Security Audit

Safe
v4 • 1/17/2026

All static findings are false positives. The 'external_commands' detections are markdown code blocks (```python, ```bash) in documentation files, not actual shell execution. No malicious code, network requests, or security risks exist. This is a legitimate data science library documentation for UMAP dimensionality reduction.

3
Files scanned
1,740
Lines analyzed
1
findings
4
Total audits
Audited by: claude View Audit History →

Quality Score

41
Architecture
100
Maintainability
87
Content
21
Community
100
Security
83
Spec Compliance

What You Can Build

Visualize high-dimensional datasets

Create 2D scatter plots of complex data like gene expression, text embeddings, or customer behavior for pattern discovery.

Preprocess data for clustering

Reduce dimensions before applying HDBSCAN to overcome curse of dimensionality and improve cluster quality.

Feature engineering for ML pipelines

Create compact 10-50 dimensional embeddings that preserve structure for downstream classification or regression tasks.

Try These Prompts

Basic visualization
Apply UMAP to reduce my dataset to 2D for visualization. Use standard parameters and create a scatter plot colored by the target variable.
Clustering optimization
Configure UMAP for clustering preprocessing with n_neighbors=30, min_dist=0.0, n_components=10, then apply HDBSCAN to find clusters.
Supervised embedding
Create a supervised UMAP embedding using my class labels to separate categories while preserving internal structure within each class.
Custom metric selection
Apply UMAP with cosine distance for my document embeddings, or use hamming distance for binary feature data.

Best Practices

  • Always standardize features before applying UMAP to ensure equal weighting across dimensions
  • Set random_state parameter for reproducible results across runs
  • Use n_neighbors=30, min_dist=0.0, n_components=10 for clustering preprocessing workflows

Avoid

  • Applying UMAP to raw unscaled data will produce biased embeddings with unequal feature weighting
  • Using default parameters for all tasks without tuning for specific goals reduces effectiveness
  • Assuming UMAP preserves density perfectly - it can create artificial cluster divisions

Frequently Asked Questions

When should I use UMAP vs t-SNE?
Use UMAP for faster computation, better preservation of global structure, and when you need to transform new data. UMAP scales better to larger datasets.
Why are my clusters disconnected?
Increase n_neighbors parameter to emphasize more global structure and connect fragmented components. Values of 50-200 work well.
How do I make results reproducible?
Set the random_state parameter to any integer value. This fixes the stochastic optimization seed for consistent embeddings.
Can UMAP handle categorical variables?
UMAP works with numeric data. Encode categorical variables using one-hot encoding or use hamming distance for binary encoded data.
What is the difference between fit() and fit_transform()?
fit_transform() combines training and transformation in one step. Use fit() followed by transform() when you need to apply the same embedding to new data.
How do I choose the right number of components?
Use 2-3 for visualization, 5-10 for clustering preprocessing, and 10-50 for feature engineering in machine learning pipelines.

Developer Details

File structure