I recently completed Module 2 of the LLM Zoomcamp 2026 by DataTalksClub, and the experience reshaped how I approach search in AI systems. While Module 1 focused on retrieval-augmented generation (RAG) and agentic pipelines, Module 2 drilled into the search layer—revealing that keyword matching is only half the story. The real magic happens when you combine exact-term retrieval with semantic vector search.
Why vector search matters more than keywords
Traditional keyword search relies on literal word matches. If you search for “enroll,” it finds documents containing “enroll”—but misses documents using “joining,” “signing up,” or “registration,” even when they mean the same thing. That limits precision, especially when users phrase queries differently from the content.
Vector search changes the game by representing text as high-dimensional vectors. Each piece of text is converted into a numerical vector—often 384 or more dimensions—where similar meanings cluster closer together. That means searching for “enroll” can retrieve documents about “onboarding” or “registration,” because their vectors point in similar directions.
This isn’t just academic. It’s the engine behind modern RAG systems, enabling AI assistants to pull contextually relevant information even when the source text uses entirely different wording.
Building a vector search system step by step
I didn’t just read about vector search—I built one from scratch to understand how it works under the hood.
1\. Embedding text into vectors efficiently
Instead of loading a full PyTorch stack with CUDA dependencies, I used a lightweight ONNX runtime with a pre-trained model (Xenova/all-MiniLM-L6-v2). The result: identical 384-dimensional vectors, but with a 30x smaller installation footprint and no GPU required.
from embedder import Embedder
embedder = Embedder() # Loads model via ONNX
v = embedder.encode("How does approximate nearest neighbor search work?")
print(len(v)) # Output: 384Each dimension in the vector encodes a nuance of meaning. For example, vectors for “enroll in a course” and “sign up for a class” will point in similar directions, while “enroll” and “pizza” will be far apart.
2\. Writing vector search logic with NumPy
Before relying on specialized libraries, I implemented core search logic manually to grasp the mechanics. Using cosine similarity on normalized vectors, I compared query vectors against a matrix of document chunk embeddings.
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b)
# X is the matrix of all chunk embeddings
scores = X.dot(v) # Dot product gives similarity
best_idx = np.argmax(scores)This mirrors what vector databases like Qdrant and pgvector do internally—only these systems use optimized indexing like HNSW to scale efficiently.
3\. Chunking documents for better retrieval
Raw documents are too long for precise embedding. A 10,000-character page dilutes semantic signals—key context can get buried. The solution: chunking.
from gitsource import chunk_documents
chunks = chunk_documents(documents, size=2000, step=1000)
# 72 pages → 295 overlapping chunksOverlapping chunks (with step < size) prevent sentences from being split mid-thought. This improves retrieval accuracy and reduces LLM input tokens by up to 3x.
4\. Using minsearch’s VectorSearch for clean integration
I wrapped the NumPy logic into a reusable class using the minsearch library. It provides a clean interface for vector search alongside keyword fields.
from minsearch import VectorSearch
vector_index = VectorSearch(keyword_fields=["filename"])
vector_index.fit(X, chunks)
results = vector_index.search(query_vector, num_results=5)This makes it easy to integrate vector search into larger RAG pipelines without rewriting core math.
Keyword vs. vector search: the showdown
I tested both approaches on a real query: “How do I store vectors in PostgreSQL?”
- Keyword search missed
08-pgvector.mdbecause the document used “pgvector” but not the exact phrase “store vectors.”
- Vector search ranked
08-pgvector.mdfirst because it recognized the semantic connection between “store vectors” and “pgvector.”
The difference highlights vector search’s strength: it finds meaning, not just words.
Why hybrid search beats either method alone
Neither approach is perfect in isolation:
- Vector search excels at semantic understanding but can miss exact terms, names, or technical codes.
- Keyword search nails exact matches but fails on paraphrases or synonyms.
The solution? Hybrid search with Reciprocal Rank Fusion (RRF).
RRF combines results from multiple lists by scoring each document based on its rank across all lists—not raw scores, which may not be comparable. A document appearing in the top 3 of both lists beats one that’s first in only one list.
def rrf(result_lists, k=60, num_results=5):
scores = {}
docs = {}
for results in result_lists:
for rank, doc in enumerate(results):
key = (doc["filename"], doc["start"])
scores[key] = scores.get(key, 0) + 1 / (k + rank)
docs[key] = doc
ranked = sorted(scores, key=scores.get, reverse=True)
return [docs[key] for key in ranked[:num_results]]
results = rrf([vector_results, text_results])This approach consistently outperforms single-method search in production systems.
Five lessons for building better search systems
- Embeddings capture meaning, not words. “Enroll” and “join” produce similar vectors; “pizza” and “enrollment” do not. Semantic search thrives on this distinction.
- Chunking is non-negotiable. Full pages dilute relevance. Overlapping 2,000-character chunks improve precision and reduce token waste.
- Hybrid search wins in production. Combining keyword and vector search with RRF delivers the best of both worlds.
- ONNX enables vector search anywhere. No GPU, no CUDA, no heavy frameworks—just a 67MB model that runs on a basic laptop.
- Measure before choosing. Vector search dominates for semantic queries; keyword search wins for exact terms like names or IDs. Test both—and combine them.
My open-source homework and next steps
All code from my Module 2 journey is available on GitHub:
github.com/Derrick-Ryan-Giggs/llm-zoomcamp-2026
It includes:
vector-search.ipynb— embeddings, Qdrant integration, and vector-based RAGVector Search Homework.ipynb— full implementation walkthrough
If you’re exploring AI-powered search, the LLM Zoomcamp remains a free, no-barrier way to learn. No paywalls, no certificate fees—just hands-on labs and community support.
Sign up at DataTalksClub’s repository. Whether you’re building a chatbot, research assistant, or knowledge system, the right search engine can make or break your results. Start experimenting today.
AI summary
Learn to build a hybrid search engine combining keyword and vector search for smarter AI retrieval. Includes code, chunking tips, and hybrid RRF logic.