hybrid-search-implementation
Implement Hybrid Search for RAG
Pure vector search misses exact matches while keyword search fails on semantic queries. This skill provides ready-to-use templates for combining both approaches using RRF fusion, linear combination, and cross-encoder reranking.
Download the skill ZIP
Upload in Claude
Go to Settings → Capabilities → Skills → Upload skill
Toggle on and start using
Test it
Using "hybrid-search-implementation". How do I combine vector and keyword search results?
Expected outcome:
- Use Reciprocal Rank Fusion (RRF):
- 1. Run both searches to get ranked lists
- 2. For each result, calculate score = sum(weight / (k + rank))
- 3. Fuse by sorting combined scores
- Example:
- vector_results = [('doc1', 0.9), ('doc2', 0.8)]
- keyword_results = [('doc2', 0.95), ('doc3', 0.7)]
- fused = rrf_fusion([vector_results, keyword_results])
- # Returns [('doc2', 0.03), ('doc1', 0.016), ('doc3', 0.016)]
Using "hybrid-search-implementation". What's the difference between RRF and linear fusion?
Expected outcome:
- RRF (Reciprocal Rank Fusion):
- - Score = sum(1 / (k + rank))
- - Rank-based, no score normalization needed
- - k=60 is standard constant
- - Works well without tuning
- Linear Combination:
- - Score = alpha * vector_score + (1-alpha) * keyword_score
- - Requires score normalization to [0,1]
- - Alpha needs empirical tuning per dataset
- - More controllable but requires tuning
Using "hybrid-search-implementation". Should I add reranking to my hybrid search?
Expected outcome:
- Yes, reranking improves quality significantly:
- 1. Fetch more candidates from hybrid search (50-100)
- 2. Use cross-encoder to score query-document pairs
- 3. Return top-k reranked results
- Trade-offs:
- - + Better precision and relevance
- - - Additional latency (cross-encoder inference)
- - Model: cross-encoder/ms-marco-MiniLM-L-6-v2 is a good starting point
Security Audit
SafeAll static findings are false positives. The skill contains documentation templates for hybrid search algorithms (RRF, linear fusion) with PostgreSQL, Elasticsearch, and custom RAG pipelines. Static scanner misidentified mathematical formulas as crypto operations, markdown code fences as command execution, and benign terminology as security risks. No malicious code or credential exfiltration present.
Risk Factors
⚡ Contains scripts (1)
📁 Filesystem access (1)
Quality Score
What You Can Build
Build RAG Systems with Better Recall
Combine semantic understanding with exact matching to improve document retrieval for LLM context. Handle queries that need both conceptual similarity and specific terminology.
Implement Enterprise Search
Create search systems that find both semantically related content and documents containing exact terms like product codes, names, or identifiers.
Improve Search Quality Metrics
Apply fusion techniques like RRF to boost recall without sacrificing precision. Log individual scores to debug and tune search quality.
Try These Prompts
Help me implement Reciprocal Rank Fusion to combine vector and keyword search results. I have two lists of (doc_id, score) tuples. Show me how to fuse them.
Show me how to set up a PostgreSQL table with pgvector for embeddings and tsvector for full-text search. Include the HNSW and GIN index definitions.
Help me write an Elasticsearch hybrid search query that combines dense vector kNN with BM25 text matching using the RRF rank feature.
Create a complete HybridRAGPipeline class that executes vector and keyword searches in parallel, fuses results with configurable methods (RRF or linear), and optionally reranks with a cross-encoder.
Best Practices
- Start with RRF fusion as it works well without parameter tuning. Use k=60 as the standard constant.
- Fetch more candidates from individual searches (3x the final limit) before fusion to ensure good recall.
- Log both vector and keyword scores separately during development. This helps debug when results are missing.
- Use cross-encoder reranking for production systems. The quality improvement is significant.
Avoid
- Don't assume a single fusion weight works for all queries. Some queries need more semantic matching while others need keyword matching.
- Don't skip keyword search entirely. Exact term matching handles names, codes, and specific phrases better than vectors.
- Don't over-fetch candidates. Balance recall needs against latency. 50-100 candidates before reranking is usually sufficient.