Skills vector-index-tuning

🔍

vector-index-tuning

Name: vector-index-tuning
Author: wshobson

Safe 🌐 Network access

Optimize vector index tuning for speed and recall

Also available from: sickn33

Vector search feels slow or costly when indexes are misconfigured. This skill provides tuning templates and heuristics to improve latency, recall, and memory use for HNSW and quantization strategies.

Supports: Claude Codex Code(CC)

📊 69 Adequate

Download the skill ZIP

Upload in Claude

Go to Settings → Capabilities → Skills → Upload skill

Toggle on and start using

Test it

Using "vector-index-tuning". Suggest HNSW parameters for 1M vectors with 0.95 recall and under 10 ms latency.

Expected outcome:

Recommended M: 32 and efConstruction: 200 for build quality
Set efSearch to 128 to target 0.95 recall
Estimate memory overhead with M at 32 and validate with a small benchmark

Using "vector-index-tuning". What memory savings can I get by switching from FP32 to INT8 quantization?

Expected outcome:

FP32 uses 4 bytes per dimension, INT8 uses 1 byte
For 768-dim vectors: FP32 = 3KB, INT8 = 768 bytes per vector
Approximately 75% memory reduction with minor recall impact

Using "vector-index-tuning". How do I choose between IVF and HNSW for 50M vectors?

Expected outcome:

HNSW: better recall at cost of memory and build time
IVF: lower memory, faster build, slightly lower recall
Consider hybrid: IVF-PQ for 50M+ vectors when memory constrained

Security Audit

Safe

v4 • 1/17/2026

Pure documentation skill with instructional Python templates for vector index tuning. All static findings are false positives: hardcoded URLs are documentation references, weak crypto patterns matched legitimate quantization terminology, backticks are markdown formatting, and memory-mapped references are Qdrant config parameters.

Files scanned

723

Lines analyzed

findings

Total audits

Risk Factors

🌐 Network access (1)

SKILL.md:519-521

Audited by: claude View Audit History →

Quality Score

Architecture

100

Maintainability

Content

Community

100

Security

Spec Compliance

What You Can Build

Tune ANN for recall

Find HNSW settings that meet recall targets without exceeding latency budgets.

Reduce memory footprint

Evaluate quantization options and estimate storage tradeoffs at scale.

Plan index scaling

Select index types and configurations for millions to billions of vectors.

Try These Prompts

Quick HNSW sweep

Benchmark HNSW M and efSearch for 200k vectors targeting 0.95 recall. Suggest the best balanced configuration.

Quantization choice

Compare fp16, int8, and product quantization for 10M vectors of 768 dims. Summarize memory and recall impacts.

Qdrant config

Create Qdrant collection settings for balanced recall and speed with 5M vectors. Include HNSW and quantization configs.

Monitoring plan

Define metrics and a testing loop to track latency percentiles and recall drift for weekly index updates.

Best Practices

Benchmark with real queries and a ground truth set for accurate recall measurement
Start with default parameters, then tune one variable at a time systematically
Track latency percentiles and recall after each configuration change

Avoid

Tuning without measuring recall against a known ground truth set
Changing multiple parameters simultaneously without controlled experiments
Ignoring memory overhead when increasing M or efSearch values

Frequently Asked Questions

What platforms does this skill support?

Works with Claude, Codex, and Claude Code. Provides general guidance with Qdrant-specific examples.

What are the main limits of the templates?

Templates are Python examples requiring libraries like hnswlib and sklearn to run. Users must provide their own data and queries.

Can I integrate this into my pipeline?

Yes. Use templates as building blocks in benchmarking scripts, CI jobs, or performance testing workflows.

Does it access or send my data?

No. The skill content is static documentation. No data collection or network calls occur from the skill itself.

What if benchmark results are noisy?

Increase query sample size, fix random seeds, and separate index build timing from search timing measurements.

How does this compare to generic tuning guides?

Provides concrete Python templates, parameter ranges, memory estimation formulas, and Qdrant-specific configurations.

Developer Details

Author

wshobson

License

MIT

Repository

https://github.com/wshobson/agents/tree/main/plugins/llm-application-dev/skills/vector-index-tuning

Ref

main

File structure

📄 SKILL.md