iToverDose/Software· 19 MAY 2026 · 12:00

Benchmarking GraphRAG vs Traditional RAG for Indian Health Research

A new benchmarking platform compares three AI retrieval pipelines on 9,000+ Indian public health papers, revealing why GraphRAG outperforms traditional RAG for complex medical queries that require multi-hop reasoning.

DEV Community4 min read0 Comments

Researchers are developing a rigorous benchmarking platform to evaluate how different AI retrieval pipelines handle multi-hop medical questions in Indian public health literature. The system tests three approaches—LLM-only, traditional RAG with vector search, and GraphRAG using knowledge graphs—on a corpus of over 9,000 papers covering diabetes, tuberculosis, maternal health, and malaria.

The Challenge of Multi-Hop Medical Queries

Traditional retrieval-augmented generation (RAG) systems often struggle with questions that require connecting seemingly unrelated medical concepts. For example, when asked how diabetes affects tuberculosis treatment outcomes, a standard RAG system might return relevant chunks about diabetes, TB, and HbA1c levels. However, it fails to recognize the critical relationships between these concepts, which are essential to answering the question comprehensively.

Three key failure modes emerge in such scenarios:

  • Indirect relationships remain invisible: A query about rifampicin's impact on glycemic control in diabetic TB patients requires linking enzyme induction to glucose metabolism—information that may not appear in any single paper.
  • Entity roles get confused: Questions about MDR-TB treatment in pediatric patients might retrieve adult-focused studies due to similar keywords but different population contexts.
  • Corpus-wide aggregation becomes impossible: Queries asking for the most common comorbidities in Indian TB literature cannot be answered by examining individual chunks alone; they require synthesizing information across the entire corpus.

Building a Dedicated Benchmarking Platform

The platform evaluates three retrieval strategies using the same large language model (LLM) and identical queries to ensure fair comparison:

  • LLM-only: No retrieval, relying solely on the model's training data
  • Basic RAG: Uses FAISS for vector search with cross-encoder reranking
  • GraphRAG: Implements TigerGraph for multi-hop traversal of knowledge graphs

Performance metrics include token usage, computational cost, response latency, LLM-as-a-Judge quality assessments, and BERTScore F1 measurements. While the corpus ingestion pipeline is complete, researchers are currently populating the Indian public health research database from PubMed Central.

Constructing the Indian Health Research Corpus

The team leveraged PubMed's E-utilities API with domain-specific MeSH queries to compile papers from Indian institutions. The Python implementation efficiently handles batch processing and caching:

from Bio import Entrez

Entrez.email = "your@email.com"

def fetch_pmids(domain_query: str, max_results: int = 3000) -> list[str]:
    handle = Entrez.esearch(
        db="pmc",
        term=domain_query,
        usehistory="y",
        retmax=0
    )
    search_results = Entrez.read(handle)
    handle.close()
    
    web_env = search_results["WebEnv"]
    query_key = search_results["QueryKey"]
    total = int(search_results["Count"])
    
    pmids = []
    batch_size = 200
    for start in range(0, min(total, max_results), batch_size):
        fetch_handle = Entrez.efetch(
            db="pmc",
            rettype="xml",
            retmode="xml",
            retstart=start,
            retmax=batch_size,
            webenv=web_env,
            query_key=query_key
        )
        records = Entrez.read(fetch_handle)
        fetch_handle.close()
        pmids.extend([r["MedlineCitation"]["PMID"] for r in records["PubmedArticle"]])
    return pmids

A sample query for tuberculosis research incorporated MeSH terms and institutional filters:

(tuberculosis[MeSH] OR "TB"[tiab] OR "MDR-TB"[tiab]) AND ("India"[Affiliation] OR "Indian"[Affiliation]) AND (epidemiology[MeSH] OR "public health"[tiab] OR "clinical trial"[tiab])

During corpus construction, several challenges emerged:

  • Approximately 8% of papers lacked abstracts, requiring fallback to full-text extraction
  • Affiliation strings varied widely (AIIMS, All India Institute of Medical Sciences, New Delhi 110029) and needed standardization
  • Duplicate papers from PMC versioning required careful deduplication
  • Retraction Watch cross-references became essential for maintaining medical integrity

Affiliation filtering used multi-pass regex patterns covering country mentions, known Indian institution abbreviations (AIIMS, JIPMER, ICMR, PGIMER, NIMHANS, CMC Vellore), and major city names, achieving a 2-3% false positive rate.

Knowledge Graph Design Determines Retrieval Success

The knowledge graph schema represents the most critical architectural decision. An overly sparse graph yields no results, while an overloaded one produces spurious connections. The final design includes 10 vertex types and 10 edge types:

Vertex types:

  • Disease
  • Treatment
  • Biomarker
  • Population
  • GeographicRegion
  • Intervention
  • Outcome
  • Study
  • Institution
  • Comorbidity

Edge types with semantic meaning:

  • TREATS: Treatment → Disease
  • ASSOCIATED_WITH: Disease ↔ Disease or Disease ↔ Biomarker
  • MEASURED_BY: Disease → Biomarker
  • RISK_FACTOR_FOR: Biomarker/Population → Disease
  • COMPLICATES: Disease ↔ Disease (bidirectional)
  • REPORTS_OUTCOME: Study → Outcome
  • STUDIED_IN: Study → Population/GeographicRegion
  • CO_OCCURS_WITH: Disease ↔ Disease (same study context)
  • CONDUCTED_BY: Study → Institution
  • PART_OF: GeographicRegion → GeographicRegion (hierarchy)

Each edge carries a confidence score from the extraction model, with edges below 0.65 filtered during retrieval. This quality gate proved essential for meaningful traversal. The final graph contains 17,830 vertices, 142,000 edges, an average vertex degree of 8.0, and a graph diameter of approximately six hops.

Multi-Hop Retrieval in Practice

Consider the query: What is the impact of diabetes on TB treatment outcomes in India?

The GraphRAG system first extracts entities from the query using spaCy's biomedical model. The traversal then follows semantic paths through the knowledge graph, connecting diabetes treatments to TB outcomes via intermediate nodes like HbA1c levels and patient populations. This approach mirrors how medical researchers naturally synthesize information across multiple studies.

While benchmark numbers are pending, early prototypes demonstrated that graph-based retrieval maintains semantic relationships that traditional vector search cannot capture. The team anticipates that GraphRAG will show superior performance on complex medical queries requiring multi-hop reasoning, though comprehensive testing awaits completion of the Indian health corpus.

The platform represents an important step toward developing AI systems that can match the nuanced understanding required for medical research literature, particularly in specialized domains like Indian public health.

AI summary

Graf tabanlı arama sistemleri, geleneksel arama sistemlerine kıyasla daha iyi performans gösterir ve özellikle, çok adımlı sorulara cevap vermek için gerekli olan kavramlar arasındaki ilişkileri anlamakta daha başarılıdırlar.

Comments

00
LEAVE A COMMENT
ID #ZLGB8R

0 / 1200 CHARACTERS

Human check

2 + 2 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.