Skills vector-index-tuning
🔍

vector-index-tuning

Safe 🌐 Network access

Optimize vector index tuning for speed and recall

Vector search feels slow or costly when indexes are misconfigured. This skill provides tuning templates and heuristics to improve latency, recall, and memory use for HNSW and quantization strategies.

Supports: Claude Codex Code(CC)
📊 70 Adequate
1

Download the skill ZIP

2

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

3

Toggle on and start using

Test it

Using "vector-index-tuning". Suggest HNSW parameters for 1M vectors with 0.95 recall and under 10 ms latency.

Expected outcome:

  • Recommended M: 32 and efConstruction: 200 for build quality
  • Set efSearch to 128 to target 0.95 recall
  • Estimate memory overhead with M at 32 and validate with a small benchmark

Using "vector-index-tuning". What memory savings can I get by switching from FP32 to INT8 quantization?

Expected outcome:

  • FP32 uses 4 bytes per dimension, INT8 uses 1 byte
  • For 768-dim vectors: FP32 = 3KB, INT8 = 768 bytes per vector
  • Approximately 75% memory reduction with minor recall impact

Using "vector-index-tuning". How do I choose between IVF and HNSW for 50M vectors?

Expected outcome:

  • HNSW: better recall at cost of memory and build time
  • IVF: lower memory, faster build, slightly lower recall
  • Consider hybrid: IVF-PQ for 50M+ vectors when memory constrained

Security Audit

Safe
v4 • 1/17/2026

Pure documentation skill with instructional Python templates for vector index tuning. All static findings are false positives: hardcoded URLs are documentation references, weak crypto patterns matched legitimate quantization terminology, backticks are markdown formatting, and memory-mapped references are Qdrant config parameters.

2
Files scanned
723
Lines analyzed
1
findings
4
Total audits

Risk Factors

🌐 Network access (1)
Audited by: claude View Audit History →

Quality Score

38
Architecture
100
Maintainability
85
Content
30
Community
100
Security
87
Spec Compliance

What You Can Build

Tune ANN for recall

Find HNSW settings that meet recall targets without exceeding latency budgets.

Reduce memory footprint

Evaluate quantization options and estimate storage tradeoffs at scale.

Plan index scaling

Select index types and configurations for millions to billions of vectors.

Try These Prompts

Quick HNSW sweep
Benchmark HNSW M and efSearch for 200k vectors targeting 0.95 recall. Suggest the best balanced configuration.
Quantization choice
Compare fp16, int8, and product quantization for 10M vectors of 768 dims. Summarize memory and recall impacts.
Qdrant config
Create Qdrant collection settings for balanced recall and speed with 5M vectors. Include HNSW and quantization configs.
Monitoring plan
Define metrics and a testing loop to track latency percentiles and recall drift for weekly index updates.

Best Practices

  • Benchmark with real queries and a ground truth set for accurate recall measurement
  • Start with default parameters, then tune one variable at a time systematically
  • Track latency percentiles and recall after each configuration change

Avoid

  • Tuning without measuring recall against a known ground truth set
  • Changing multiple parameters simultaneously without controlled experiments
  • Ignoring memory overhead when increasing M or efSearch values

Frequently Asked Questions

What platforms does this skill support?
Works with Claude, Codex, and Claude Code. Provides general guidance with Qdrant-specific examples.
What are the main limits of the templates?
Templates are Python examples requiring libraries like hnswlib and sklearn to run. Users must provide their own data and queries.
Can I integrate this into my pipeline?
Yes. Use templates as building blocks in benchmarking scripts, CI jobs, or performance testing workflows.
Does it access or send my data?
No. The skill content is static documentation. No data collection or network calls occur from the skill itself.
What if benchmark results are noisy?
Increase query sample size, fix random seeds, and separate index build timing from search timing measurements.
How does this compare to generic tuning guides?
Provides concrete Python templates, parameter ranges, memory estimation formulas, and Qdrant-specific configurations.

Developer Details

File structure

📄 SKILL.md