What I learned building a hybrid search engine from scratch

I recently completed Module 2 of the LLM Zoomcamp 2026 by DataTalksClub, and the experience reshaped how I approach search in AI systems. While Module 1 focused on retrieval-augmented generation (RAG) and agentic pipelines, Module 2 drilled into the search layer—revealing that keyword matching is only half the story. The real magic happens when you combine exact-term retrieval with semantic vector search.

Why vector search matters more than keywords

Traditional keyword search relies on literal word matches. If you search for “enroll,” it finds documents containing “enroll”—but misses documents using “joining,” “signing up,” or “registration,” even when they mean the same thing. That limits precision, especially when users phrase queries differently from the content.

Vector search changes the game by representing text as high-dimensional vectors. Each piece of text is converted into a numerical vector—often 384 or more dimensions—where similar meanings cluster closer together. That means searching for “enroll” can retrieve documents about “onboarding” or “registration,” because their vectors point in similar directions.

This isn’t just academic. It’s the engine behind modern RAG systems, enabling AI assistants to pull contextually relevant information even when the source text uses entirely different wording.

Building a vector search system step by step

I didn’t just read about vector search—I built one from scratch to understand how it works under the hood.

1\. Embedding text into vectors efficiently

Instead of loading a full PyTorch stack with CUDA dependencies, I used a lightweight ONNX runtime with a pre-trained model (Xenova/all-MiniLM-L6-v2). The result: identical 384-dimensional vectors, but with a 30x smaller installation footprint and no GPU required.

from embedder import Embedder

embedder = Embedder()  # Loads model via ONNX
v = embedder.encode("How does approximate nearest neighbor search work?")
print(len(v))  # Output: 384

Each dimension in the vector encodes a nuance of meaning. For example, vectors for “enroll in a course” and “sign up for a class” will point in similar directions, while “enroll” and “pizza” will be far apart.

2\. Writing vector search logic with NumPy

Before relying on specialized libraries, I implemented core search logic manually to grasp the mechanics. Using cosine similarity on normalized vectors, I compared query vectors against a matrix of document chunk embeddings.

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b)

# X is the matrix of all chunk embeddings
scores = X.dot(v)  # Dot product gives similarity
best_idx = np.argmax(scores)

This mirrors what vector databases like Qdrant and pgvector do internally—only these systems use optimized indexing like HNSW to scale efficiently.

3\. Chunking documents for better retrieval

Raw documents are too long for precise embedding. A 10,000-character page dilutes semantic signals—key context can get buried. The solution: chunking.

from gitsource import chunk_documents

chunks = chunk_documents(documents, size=2000, step=1000)
# 72 pages → 295 overlapping chunks

Overlapping chunks (with step < size) prevent sentences from being split mid-thought. This improves retrieval accuracy and reduces LLM input tokens by up to 3x.

4\. Using minsearch’s VectorSearch for clean integration

I wrapped the NumPy logic into a reusable class using the minsearch library. It provides a clean interface for vector search alongside keyword fields.

from minsearch import VectorSearch

vector_index = VectorSearch(keyword_fields=["filename"])
vector_index.fit(X, chunks)
results = vector_index.search(query_vector, num_results=5)

This makes it easy to integrate vector search into larger RAG pipelines without rewriting core math.

Keyword vs. vector search: the showdown

I tested both approaches on a real query: “How do I store vectors in PostgreSQL?”

Keyword search missed 08-pgvector.md because the document used “pgvector” but not the exact phrase “store vectors.”

Vector search ranked 08-pgvector.md first because it recognized the semantic connection between “store vectors” and “pgvector.”

The difference highlights vector search’s strength: it finds meaning, not just words.

Why hybrid search beats either method alone

Neither approach is perfect in isolation:

Vector search excels at semantic understanding but can miss exact terms, names, or technical codes.
Keyword search nails exact matches but fails on paraphrases or synonyms.

The solution? Hybrid search with Reciprocal Rank Fusion (RRF).

RRF combines results from multiple lists by scoring each document based on its rank across all lists—not raw scores, which may not be comparable. A document appearing in the top 3 of both lists beats one that’s first in only one list.

def rrf(result_lists, k=60, num_results=5):
    scores = {}
    docs = {}
    for results in result_lists:
        for rank, doc in enumerate(results):
            key = (doc["filename"], doc["start"])
            scores[key] = scores.get(key, 0) + 1 / (k + rank)
            docs[key] = doc
    ranked = sorted(scores, key=scores.get, reverse=True)
    return [docs[key] for key in ranked[:num_results]]

results = rrf([vector_results, text_results])

This approach consistently outperforms single-method search in production systems.

Five lessons for building better search systems

Embeddings capture meaning, not words. “Enroll” and “join” produce similar vectors; “pizza” and “enrollment” do not. Semantic search thrives on this distinction.

Chunking is non-negotiable. Full pages dilute relevance. Overlapping 2,000-character chunks improve precision and reduce token waste.

Hybrid search wins in production. Combining keyword and vector search with RRF delivers the best of both worlds.

ONNX enables vector search anywhere. No GPU, no CUDA, no heavy frameworks—just a 67MB model that runs on a basic laptop.

Measure before choosing. Vector search dominates for semantic queries; keyword search wins for exact terms like names or IDs. Test both—and combine them.

My open-source homework and next steps

All code from my Module 2 journey is available on GitHub:

github.com/Derrick-Ryan-Giggs/llm-zoomcamp-2026

It includes:

vector-search.ipynb — embeddings, Qdrant integration, and vector-based RAG
Vector Search Homework.ipynb — full implementation walkthrough

If you’re exploring AI-powered search, the LLM Zoomcamp remains a free, no-barrier way to learn. No paywalls, no certificate fees—just hands-on labs and community support.

Sign up at DataTalksClub’s repository. Whether you’re building a chatbot, research assistant, or knowledge system, the right search engine can make or break your results. Start experimenting today.

AI summary

Learn to build a hybrid search engine combining keyword and vector search for smarter AI retrieval. Includes code, chunking tips, and hybrid RRF logic.