SurrealDB’s hybrid search blends vector and full-text for precise docs results

Searching through technical documentation has always been a balancing act between precision and relevance. Developers expect instant, context-aware results, but traditional full-text searches often miss nuanced meanings while pure vector searches can struggle with exact matches. SurrealDB’s recently launched hybrid search system for its documentation site solves this by combining both approaches into a single, open-source solution.

The new system powers the Search the docs field on SurrealDB’s official site, delivering results that account for both keyword matches and semantic similarity. Unlike many search implementations that require external tools or complex pipelines, this solution runs entirely within SurrealDB’s query engine, using its native hybrid search capabilities introduced during the 3.0 beta period.

Why hybrid search matters in documentation

Full-text search excels at matching exact terms but falls short when dealing with synonyms, variations, or contextual meanings. For example, the word "lead" can refer to a project leader or a toxic metal—context that a basic keyword search cannot capture. On the other hand, vector search leverages embeddings to understand semantic relationships but may miss precise technical terms or rare phrases that appear verbatim in documentation.

SurrealDB’s approach merges these strengths using Reciprocal Rank Fusion (RRF), a method that intelligently combines ranked results from both full-text and vector searches. This ensures that the most relevant documents appear first, whether they contain the exact term or a semantically similar phrase.

Breaking down the SurrealDB implementation

The hybrid search system relies on two core components: a custom full-text analyzer and vector embeddings generated via OpenAI’s API. Here’s how each part contributes to the final result:

Full-text analyzer: Uses a defined set of tokenizers and filters to preprocess text before indexing. For documentation, the analyzer splits text into tokens, normalizes case, and applies language-specific stemming (e.g., reducing "frowning" to "frown").

The analyzer is defined in SurrealQL as follows:

  DEFINE ANALYZER simple TOKENIZERS blank, class, camel, punct 
    FILTERS snowball(english);

To see how it works in practice, you can analyze a sample sentence:

  search::analyze(
    "simple",
    "The project lead frowned and took a hard look at the results."
  ).join(' ');

The output normalizes variations like "frowned" to "frown", enabling matches against different forms of the same word.

Vector embeddings: Generate semantic representations of text snippets, allowing the system to match concepts rather than exact words. For instance, a query about "database performance" could surface results mentioning "query speed" or "latency optimization" if their embeddings are similar.

Hybrid fusion with RRF: Combines ranked lists from full-text and vector searches into a unified result set. The RRF function assigns scores based on the position of each result in its respective list, favoring items that appear near the top of multiple rankings.

  LET $fused = search::rrf(
    [$page_ft, $page_vs, $section_ft, $section_vs],
    60,
    80
  );

Setting up hybrid search in your own projects

Implementing this system in your own applications requires three key steps:

Define a full-text analyzer: Customize tokenization rules to match your content’s language and structure. SurrealDB’s DEFINE ANALYZER command supports a variety of tokenizers (e.g., splitting by whitespace or camelCase) and filters (e.g., stemming or case normalization).

Generate and index embeddings: Use an embedding model (such as OpenAI’s) to convert text snippets into vector representations. Store these embeddings alongside your documents in SurrealDB.

Create hybrid queries: Use SurrealDB’s search::rrf() function to fuse full-text and vector search results. Combine queries across different fields (e.g., titles, descriptions, or content sections) to maximize relevance.

For example, to enable hybrid search on a page table:

  DEFINE INDEX page_hybrid_title ON page FIELDS title
    FULLTEXT ANALYZER simple BM25;

The future of search in SurrealDB

Hybrid search represents a significant leap forward for developers building documentation sites, knowledge bases, or any application requiring precise yet context-aware results. By eliminating the need for external search services, SurrealDB streamlines deployment while maintaining high performance.

The open-source nature of this implementation means you can explore the complete codebase, test it locally, and adapt it to your needs. Whether you’re refining a technical blog or scaling an enterprise knowledge base, hybrid search offers a flexible, powerful solution that adapts to the nuances of language and meaning.

AI summary

SurrealDB’nin hibrit arama sistemiyle doküman aramasını nasıl geliştirebilirsiniz? Tam metin ve vektör aramayı birleştiren RRF yöntemi hakkında detaylı bilgi edinin.

SurrealDB’s hybrid search blends vector and full-text for precise docs results

Why hybrid search matters in documentation

Breaking down the SurrealDB implementation

Setting up hybrid search in your own projects

The future of search in SurrealDB

Comments

Why your messy codebase makes AI tools stumble

How to Eliminate Static AWS Keys for Safer Cloud Deployments

Why 'Free' Local AI Executors Can Cost More Than Cloud Models