向量搜尋系統在規模上通常會遇到延遲和記憶體問題。此技能提供經過驗證的 HNSW 調參模式和量化策略,以平衡召回率、速度和資源使用。
下载技能 ZIP
在 Claude 中上传
前往 设置 → 功能 → 技能 → 上传技能
开启并开始使用
测试它
正在使用“vector-index-tuning”。 Recommend HNSW parameters for 100K vectors targeting 95% recall
预期结果:
For 100K vectors with 95% recall target: M=32 (increased connections for better graph connectivity), efConstruction=200 (thorough index building), efSearch=128 (balanced search quality). Expected memory: ~250MB for FP32 vectors. Build time: 30-60 seconds. Search latency: 5-15ms at p99.
正在使用“vector-index-tuning”。 How much memory does INT8 quantization save for 1M 768-dim vectors?
预期结果:
FP32 baseline: 1M × 768 × 4 bytes = 2.93GB. INT8 quantized: 1M × 768 × 1 byte = 732MB. Memory reduction: 75% savings (2.2GB). Recall impact: typically 1-3% degradation. Recommended for latency-sensitive applications with moderate recall requirements.
安全审计
安全Static analysis flagged 26 patterns that are all false positives. The skill contains only documentation and Python code examples for vector database optimization. Markdown code fences were misidentified as shell execution. URLs are reference links. Configuration parameter names were misidentified as filesystem operations. No actual security risks exist.
质量评分
你能构建什么
生產環境搜尋延遲優化
調優 HNSW 參數並啟用 INT8 量化,將 p99 延遲從 50ms 降低到 10ms,同時保持 95% 的召回率。
記憶體受限的索引部署
應用乘積量化以將 1000 萬個向量容納在 8GB RAM 中,同時保持可接受的召回率權衡,以降低成本敏感的部署。
向量索引擴展規劃
從 10 萬擴展到 1 億向量時選擇適當的索引類型和配置,以實現可預測的效能。
试试这些提示
I have 500,000 vectors with 768 dimensions. I need 95% recall at p99 latency under 20ms with 16GB memory budget. Recommend HNSW parameters and quantify expected memory usage.
Compare INT8 scalar quantization vs Product Quantization for my use case: 10M vectors, 512 dimensions, must fit in 8GB RAM, minimum 90% recall required. Include code to implement the recommended approach.
Generate a complete Qdrant collection configuration optimized for high-recall search on 5M product embedding vectors. Include HNSW settings, quantization config, and optimizer thresholds with explanations.
Design a benchmarking plan to evaluate HNSW parameter sweeps. I have 1M vectors, 10K query samples with ground truth labels. Include metrics to track, parameter ranges to test, and criteria for selecting the winning configuration.
最佳实践
- 使用實際生產查詢而非合成資料進行基準測試,以捕捉真實工作負載模式
- 從預設 HNSW 參數開始,只在指標顯示需要優化時才進行調參
- 在生產環境中持續監控召回率,因為資料漂移可能會隨時間降低搜尋品質
避免
- 在沒有回滾计划和暂存區驗證的情況下重新索引生產系統
- 過度優化召回率而犧牲延遲 SLA 和記憶體預算
- 跳過部署後的索引預熱,導致冷啟動延遲飆升