Retrieval-augmented generation (RAG) systems hinge on finding the right context before generating answers. Yet conventional vector search often fails when users ask precise technical questions using exact terminology. The solution? Hybrid search merges the strengths of keyword-based ranking with semantic retrieval to cover both scenarios effectively.
The Core Problem: Vector Search’s Blind Spot
Consider a knowledge base containing the sentence: "For Chinese scenarios, we recommend BAAI/bge-large-zh-v1.5, with a vector dimension of 1024." If a user queries, "What is the vector dimension of BAAI/bge-large-zh-v1.5?," you might assume a vector search would handle this effortlessly — after all, the query and document share identical words. In reality, vector search prioritizes semantic similarity over exact matches, making it less effective for questions involving specific terms like model names, parameters, or formulas.
This gap becomes evident when systems face two distinct query types:
- Keyword queries: Target exact strings like "BAAI/bge-large-zh-v1.5 dimension" or "RRF score formula."
- Semantic queries: Phrase questions conceptually, such as "How do I fix outdated AI responses?"
Vector search thrives with semantic queries but stumbles on keyword-heavy questions. BM25, the algorithm powering most search engines, excels at the opposite. Neither approach alone covers both use cases reliably.
How BM25 Works: Precision Over Semantics
BM25 (Best Match 25) is the backbone of search engines like Elasticsearch and Apache Lucene. Its formula balances three key factors:
score(D, Q) = Σ IDF(qi) × [TF(qi, D) × (k1 + 1)] / [TF(qi, D) + k1 × (1 - b + b × |D|/avgdl)]Breaking it down:
- IDF (Inverse Document Frequency): Rare terms like "BAAI/bge-large-zh-v1.5" receive higher scores because they appear infrequently across documents.
- TF (Term Frequency): Repeated terms boost relevance, but with diminishing returns to prevent overcounting.
- Document length normalization: Long documents don’t automatically outrank shorter ones just because they contain more words.
BM25’s strengths lie in exact matches. If a query includes a product name, parameter, or formula, BM25 will likely retrieve it accurately. However, it lacks semantic understanding — it won’t recognize that "knowledge cutoff" and "AI that doesn’t know recent events" convey the same idea.
Merging Results: The RRF Fusion Advantage
Combining BM25 and vector search requires a method to reconcile their disparate scoring scales. A simple weighted average won’t work because the algorithms use entirely different metrics. Enter Reciprocal Rank Fusion (RRF), which merges results based on rank rather than score.
The RRF formula calculates a score for each document:
RRF_score(d) = Σ 1 / (k + rank(d))rank(d): The document’s position in a retriever’s results (e.g., 1st, 2nd).k: A tuning constant (typically 60) to moderate the influence of top-ranked items.
Example:
| Document | BM25 Rank | Vector Rank | RRF Score (k=60) | |----------|-----------|-------------|------------------| | doc-006 | 1 | 3 | 0.0323 | | doc-003 | 3 | 1 | 0.0323 | | doc-002 | 2 | 4 | 0.0317 |
RRF ensures fairness by comparing ranks, not raw scores. This means documents appearing early in either retriever’s results receive higher combined scores, regardless of the underlying algorithm’s scoring logic.
Testing Hybrid Search: Keyword vs. Semantic Queries
A controlled experiment evaluated three retrievers—BM25, vector search, and a hybrid approach—on six queries spanning keyword and semantic scenarios. The metrics focused on Mean Reciprocal Rank (MRR), where a higher score indicates the correct document ranked closer to the top.
Query Types and Expected Outcomes:
- Keyword queries:
- "BAAI/bge-large-zh-v1.5 dimension" → doc-003
- "RRF score sum 1/(k+rank) formula" → doc-006
- "chunk_size 256 1024 overlap recommended" → doc-004
- Semantic queries:
- "My AI assistant gives outdated answers, how do I keep it current?" → doc-001
- "Multiple teams share one Q&A system — how to keep their data separate?" → doc-008
- "Rephrasing the same question returns different results — how to fix this?" → doc-007
The hybrid retriever, which fused BM25 and vector results via RRF, consistently outperformed or matched the individual approaches. For keyword-heavy queries, BM25 often led, while vector search handled semantic queries better. The hybrid model balanced both strengths without manual score adjustments.
Practical Implementation: Code Walkthrough
Deploying hybrid search requires configuring three components: BM25, vector search, and the fusion mechanism.
1. BM25 Retriever (with Chinese Text Support)
For non-English documents, tokenization is crucial. Chinese text, for example, benefits from word segmentation using the jieba library:
import jieba
from langchain_community.retrievers import BM25Retriever
def chinese_tokenizer(text: str) -> list[str]:
return list(jieba.cut(text))
bm25_retriever = BM25Retriever.from_documents(
docs,
k=3,
preprocess_func=chinese_tokenizer,
)2. Vector Retriever
Vector search leverages embeddings to capture semantic meaning. Here, the BAAI/bge-large-zh-v1.5 model generates embeddings for Chinese text:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="BAAI/bge-large-zh-v1.5",
api_key=os.getenv("EMBEDDING_API_KEY"),
base_url="
)
vectorstore = Chroma.from_documents(docs, embedding=embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})3. Hybrid Retriever
The EnsembleRetriever class from LangChain Classic merges the two retrievers using RRF. The weights parameter (e.g., [0.5, 0.5]) influences each retriever’s contribution to the fusion process, though the actual scoring remains rank-based:
from langchain_classic.retrievers import EnsembleRetriever
hybrid_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.5, 0.5],
)The Path Forward: Beyond Basic Fusion
Hybrid search is not a one-size-fits-all solution. Future improvements could include:
- Dynamic weighting: Adjust retriever contributions based on query type (e.g., favor BM25 for keyword-heavy questions).
- Query classification: Use a classifier to detect keyword vs. semantic queries and route them to the most suitable retriever.
- Context-aware fusion: Prioritize retrievers that perform best for specific document domains or languages.
As RAG systems grow more sophisticated, hybrid search stands out as a practical way to bridge the gap between precision and semantic understanding. By combining the best of both worlds, developers can build retrieval pipelines that handle the full spectrum of user queries with confidence.
AI summary
Discover why hybrid search combining BM25 and vector retrieval outperforms pure vector search for technical RAG queries. Learn implementation tips and experiment results.