Skills umap-learn

📊

umap-learn

Name: umap-learn
Author: K-Dense-AI

Safe ⚙️ External commands

Apply UMAP dimensionality reduction for data visualization

Also available from: davila7

High-dimensional data is difficult to visualize and analyze. UMAP reduces dimensions while preserving structure, enabling clear 2D/3D visualizations and better clustering results.

Supports: Claude Codex Code(CC)

📊 69 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "umap-learn". Apply UMAP to visualize my iris dataset in 2D

Expected outcome:

Created UMAP embedding with shape (150, 2)
Applied StandardScaler preprocessing
Generated scatter plot showing three distinct clusters
Preserved 92% of local neighborhood structure
Ready for interactive exploration of species relationships

Using "umap-learn". Use UMAP to preprocess my customer data for clustering

Expected outcome:

Applied clustering-optimized UMAP with n_neighbors=30, min_dist=0.0
Reduced to 10 dimensions for HDBSCAN
Identified 5 customer segments with HDBSCAN
Found 23 noise points (unassigned customers)
Density preserved better than direct 2D reduction

Using "umap-learn". Apply supervised UMAP with my labeled dataset

Expected outcome:

Used 5000 labeled samples with 50 features
Supervised embedding achieved 0.89 cluster separation
Classes are clearly visible in 2D visualization
Preserved internal structure within each class

Security Audit

Safe

v4 • 1/17/2026

All static findings are false positives. The 'external_commands' detections are markdown code blocks (```python, ```bash) in documentation files, not actual shell execution. No malicious code, network requests, or security risks exist. This is a legitimate data science library documentation for UMAP dimensionality reduction.

Files scanned

1,740

Lines analyzed

findings

Total audits

Risk Factors

⚙️ External commands (6)

SKILL.md:19-21 SKILL.md:27-41 SKILL.md:130-142 references/api_reference.md:5 references/api_reference.md:34-45 references/api_reference.md:378-397

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Visualize high-dimensional datasets

Create 2D scatter plots of complex data like gene expression, text embeddings, or customer behavior for pattern discovery.

Preprocess data for clustering

Reduce dimensions before applying HDBSCAN to overcome curse of dimensionality and improve cluster quality.

Feature engineering for ML pipelines

Create compact 10-50 dimensional embeddings that preserve structure for downstream classification or regression tasks.

Try These Prompts

Basic visualization

Apply UMAP to reduce my dataset to 2D for visualization. Use standard parameters and create a scatter plot colored by the target variable.

Clustering optimization

Configure UMAP for clustering preprocessing with n_neighbors=30, min_dist=0.0, n_components=10, then apply HDBSCAN to find clusters.

Supervised embedding

Create a supervised UMAP embedding using my class labels to separate categories while preserving internal structure within each class.

Custom metric selection

Apply UMAP with cosine distance for my document embeddings, or use hamming distance for binary feature data.

Best Practices

Always standardize features before applying UMAP to ensure equal weighting across dimensions
Set random_state parameter for reproducible results across runs
Use n_neighbors=30, min_dist=0.0, n_components=10 for clustering preprocessing workflows

Avoid

Applying UMAP to raw unscaled data will produce biased embeddings with unequal feature weighting
Using default parameters for all tasks without tuning for specific goals reduces effectiveness
Assuming UMAP preserves density perfectly - it can create artificial cluster divisions

Frequently Asked Questions

When should I use UMAP vs t-SNE?

Use UMAP for faster computation, better preservation of global structure, and when you need to transform new data. UMAP scales better to larger datasets.

Why are my clusters disconnected?

Increase n_neighbors parameter to emphasize more global structure and connect fragmented components. Values of 50-200 work well.

How do I make results reproducible?

Set the random_state parameter to any integer value. This fixes the stochastic optimization seed for consistent embeddings.

Can UMAP handle categorical variables?

UMAP works with numeric data. Encode categorical variables using one-hot encoding or use hamming distance for binary encoded data.

What is the difference between fit() and fit_transform()?

fit_transform() combines training and transformation in one step. Use fit() followed by transform() when you need to apply the same embedding to new data.

How do I choose the right number of components?

Use 2-3 for visualization, 5-10 for clustering preprocessing, and 10-50 for feature engineering in machine learning pipelines.

Developer Details

Author

K-Dense-AI

License

BSD-3-Clause license

Repository

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/umap-learn

Ref

main

File structure

📁 references/

📄 api_reference.md

📄 SKILL.md