Why Hybrid Search Beats Vector Alone in RAG Systems

Retrieval-augmented generation (RAG) systems hinge on finding the right context before generating answers. Yet conventional vector search often fails when users ask precise technical questions using exact terminology. The solution? Hybrid search merges the strengths of keyword-based ranking with semantic retrieval to cover both scenarios effectively.

The Core Problem: Vector Search’s Blind Spot

Consider a knowledge base containing the sentence: "For Chinese scenarios, we recommend BAAI/bge-large-zh-v1.5, with a vector dimension of 1024." If a user queries, "What is the vector dimension of BAAI/bge-large-zh-v1.5?," you might assume a vector search would handle this effortlessly — after all, the query and document share identical words. In reality, vector search prioritizes semantic similarity over exact matches, making it less effective for questions involving specific terms like model names, parameters, or formulas.

This gap becomes evident when systems face two distinct query types:

Keyword queries: Target exact strings like "BAAI/bge-large-zh-v1.5 dimension" or "RRF score formula."
Semantic queries: Phrase questions conceptually, such as "How do I fix outdated AI responses?"

Vector search thrives with semantic queries but stumbles on keyword-heavy questions. BM25, the algorithm powering most search engines, excels at the opposite. Neither approach alone covers both use cases reliably.

How BM25 Works: Precision Over Semantics

BM25 (Best Match 25) is the backbone of search engines like Elasticsearch and Apache Lucene. Its formula balances three key factors:

score(D, Q) = Σ IDF(qi) × [TF(qi, D) × (k1 + 1)] / [TF(qi, D) + k1 × (1 - b + b × |D|/avgdl)]

Breaking it down:

IDF (Inverse Document Frequency): Rare terms like "BAAI/bge-large-zh-v1.5" receive higher scores because they appear infrequently across documents.
TF (Term Frequency): Repeated terms boost relevance, but with diminishing returns to prevent overcounting.
Document length normalization: Long documents don’t automatically outrank shorter ones just because they contain more words.

BM25’s strengths lie in exact matches. If a query includes a product name, parameter, or formula, BM25 will likely retrieve it accurately. However, it lacks semantic understanding — it won’t recognize that "knowledge cutoff" and "AI that doesn’t know recent events" convey the same idea.

Merging Results: The RRF Fusion Advantage

Combining BM25 and vector search requires a method to reconcile their disparate scoring scales. A simple weighted average won’t work because the algorithms use entirely different metrics. Enter Reciprocal Rank Fusion (RRF), which merges results based on rank rather than score.

The RRF formula calculates a score for each document:

RRF_score(d) = Σ 1 / (k + rank(d))

rank(d): The document’s position in a retriever’s results (e.g., 1st, 2nd).
k: A tuning constant (typically 60) to moderate the influence of top-ranked items.

Example:

| Document | BM25 Rank | Vector Rank | RRF Score (k=60) | |----------|-----------|-------------|------------------| | doc-006 | 1 | 3 | 0.0323 | | doc-003 | 3 | 1 | 0.0323 | | doc-002 | 2 | 4 | 0.0317 |

RRF ensures fairness by comparing ranks, not raw scores. This means documents appearing early in either retriever’s results receive higher combined scores, regardless of the underlying algorithm’s scoring logic.

Testing Hybrid Search: Keyword vs. Semantic Queries

A controlled experiment evaluated three retrievers—BM25, vector search, and a hybrid approach—on six queries spanning keyword and semantic scenarios. The metrics focused on Mean Reciprocal Rank (MRR), where a higher score indicates the correct document ranked closer to the top.

Query Types and Expected Outcomes:

Keyword queries:
"BAAI/bge-large-zh-v1.5 dimension" → doc-003
"RRF score sum 1/(k+rank) formula" → doc-006
"chunk_size 256 1024 overlap recommended" → doc-004

Semantic queries:
"My AI assistant gives outdated answers, how do I keep it current?" → doc-001
"Multiple teams share one Q&A system — how to keep their data separate?" → doc-008
"Rephrasing the same question returns different results — how to fix this?" → doc-007

The hybrid retriever, which fused BM25 and vector results via RRF, consistently outperformed or matched the individual approaches. For keyword-heavy queries, BM25 often led, while vector search handled semantic queries better. The hybrid model balanced both strengths without manual score adjustments.

Practical Implementation: Code Walkthrough

Deploying hybrid search requires configuring three components: BM25, vector search, and the fusion mechanism.

1. BM25 Retriever (with Chinese Text Support)

For non-English documents, tokenization is crucial. Chinese text, for example, benefits from word segmentation using the jieba library:

import jieba
from langchain_community.retrievers import BM25Retriever

def chinese_tokenizer(text: str) -> list[str]:
    return list(jieba.cut(text))

bm25_retriever = BM25Retriever.from_documents(
    docs,
    k=3,
    preprocess_func=chinese_tokenizer,
)

2. Vector Retriever

Vector search leverages embeddings to capture semantic meaning. Here, the BAAI/bge-large-zh-v1.5 model generates embeddings for Chinese text:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="BAAI/bge-large-zh-v1.5",
    api_key=os.getenv("EMBEDDING_API_KEY"),
    base_url="
)

vectorstore = Chroma.from_documents(docs, embedding=embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

3. Hybrid Retriever

The EnsembleRetriever class from LangChain Classic merges the two retrievers using RRF. The weights parameter (e.g., [0.5, 0.5]) influences each retriever’s contribution to the fusion process, though the actual scoring remains rank-based:

from langchain_classic.retrievers import EnsembleRetriever

hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5],
)

The Path Forward: Beyond Basic Fusion

Hybrid search is not a one-size-fits-all solution. Future improvements could include:

Dynamic weighting: Adjust retriever contributions based on query type (e.g., favor BM25 for keyword-heavy questions).
Query classification: Use a classifier to detect keyword vs. semantic queries and route them to the most suitable retriever.
Context-aware fusion: Prioritize retrievers that perform best for specific document domains or languages.

As RAG systems grow more sophisticated, hybrid search stands out as a practical way to bridge the gap between precision and semantic understanding. By combining the best of both worlds, developers can build retrieval pipelines that handle the full spectrum of user queries with confidence.

AI summary

Discover why hybrid search combining BM25 and vector retrieval outperforms pure vector search for technical RAG queries. Learn implementation tips and experiment results.

Why Hybrid Search Beats Vector Alone in RAG Systems

The Core Problem: Vector Search’s Blind Spot

How BM25 Works: Precision Over Semantics

Merging Results: The RRF Fusion Advantage

Testing Hybrid Search: Keyword vs. Semantic Queries

Practical Implementation: Code Walkthrough

The Path Forward: Beyond Basic Fusion

Comments

2026 Travel Costs: Where $20 Per Day Beats $170 for Beach Vacations

Why Breaking Up Your App into Microservices Boosts Scalability

How Test-Driven Development Turns Fear of Bugs Into Confidence