When a 46-repository codebase grows too large for any single engineer to grasp, even its static analysis graph becomes unusable—unless you solve the entry-point problem first. Ryan Tsuji, CTO at airCloset, faced this dilemma after unifying his company’s codebase into a single knowledge graph via static analysis. The graph existed, but without semantic search, engineers still had to rely on grep or context clues to find what they needed. That broke the core purpose: giving AI models verified facts, not inferences.
The breaking point: semantic search as the foundation
In Part 1 of this series, Tsuji outlined four unresolved challenges: no semantic search, node explosion, undocumented functions, and the cost of maintaining parsers for new boundary patterns. He chose to tackle the first—semantic search—because without it, the graph’s value collapsed to zero for AI agents. “If the graph exists but the only way to reach it is grep,” he wrote, “the model ends up inferring anyway.”
Reusing a proven pattern from database schemas
Tsuji had already solved a similar structural problem months earlier with db-graph, a project that mapped over 1,000 database tables across 21 schemas into a semantically searchable graph. The pattern was clear: extract schemas statically, generate AI-written descriptions, embed them as vectors, and enable natural-language queries. This approach worked so well internally that engineers stopped memorizing table names and started asking questions in plain English.
He realized the same pattern could apply to code. Critically, code-graph already included “DB table nodes” as boundary nodes from static analysis. By joining code-graph with db-graph, the codebase inherited semantic context automatically—without a single new annotation. This insight shifted the design philosophy: stop treating graphs as isolated islands and start designing how they connect.
Annotating only what matters: boundary intent
Joining graphs solved the database context, but APIs, events, and pages still lacked meaning. Static analysis couldn’t infer intent from function bodies or endpoints. The solution was to embed intent directly into the code—but not everywhere. Retrofitting tens of thousands of functions across 46 repositories was impractical for production teams.
Tsuji pivoted to a minimalist strategy: annotate only boundary nodes. Engineers don’t need explanations for every internal helper; they need clarity on entry points—what a screen does, what an API returns, what business milestone an event marks. This principle became the heart of the design: maximum meaning with minimum annotation.
Building the service-product graph (SPG)
The resulting architecture—internally dubbed the service-product graph (SPG)—combines three peer graphs joined by SAME_ENTITY edges. There’s no hierarchy: you can start from any graph and traverse to the others.
- code-graph (structure): Functions, classes, and boundary nodes extracted via static analysis across 46 repositories.
- db-graph (DB context): 1,133 tables with AI-generated descriptions and 768-dimensional vector embeddings.
- annotation graph (intent): Boundary-focused annotations written only around entry points using
@graph-*tags.
AI agents interact with the system through a single MCP server that traverses all three graphs. The annotation graph’s MCP server acts as a proxy, handling calls to db-graph transparently. The annotation graph defines seven node types: Page, Section, Dialog, Field, Action, Api, and Task. Originally screen-focused as screen-graph, it expanded to cover backend APIs and tasks, prompting the rename to SPG.
A real annotation in practice
Below is a fictional but representative annotation that mirrors actual production examples:
/**
* @graph-page /home
* @graph-business Main screen. Members can see what they're currently renting, buy items, and initiate returns.
* @graph-label Home Screen
* @graph-has-section banners, wearing-items, wearing-return, delivery-status
* @graph-has-dialog buying-modal, return-modal
* @graph-navigates-to /return-procedure, /checkout, /my-karte
* @graph-calls GET /api/v1/wearing
* @graph-reads admin_delivery_orders, admin_rental_items
* @graph-flow styling-loop
* @graph-status monthly-member
*/Two elements drive the system’s utility. First, the @graph-business field carries the intent text—written in Japanese in the live system—which becomes the substance vectorized for semantic search. Second, fields like @graph-flow and @graph-status capture process-level context that static analysis alone cannot infer.
What’s next: balancing scope and scalability
Tsuji has resolved the entry-point problem, but the remaining challenges from Part 1 remain: node explosion, undocumented functions, and parser maintenance for new boundary patterns. Addressing them now requires careful scoping—especially as the annotation strategy scales across more services and teams. The shift from screen-centric to service-product graphs signals a broader evolution: codebases aren’t just code anymore. They’re knowledge systems where intent, structure, and context converge to power AI agents and engineers alike.
As semantic search becomes the default interface, the next frontier isn’t bigger graphs—it’s smarter joins and more intentional annotations where they matter most.
AI summary
Learn how to enable natural-language queries in large codebases using static analysis, AI-generated context, and minimal boundary annotations—no per-function rewrites required.