How contextual documentation tames AI hallucinations in legacy codebases

When a fast-growing fintech startup moved from prototyping to maintaining a three-year-old codebase, engineers expected AI coding assistants to cut feature-development time in half. Instead, the tools repeatedly duplicated user data, created redundant tables, and overlooked critical foreign-key relationships. The real breakthrough came not from tuning prompts, but from realizing that AI needs the same context a human newcomer requires: clear rules, documented decisions, and guardrails that prevent costly mistakes.

Why legacy code breaks AI tools

Production systems with years of accumulated commits, complex inter-service dependencies, and domain-specific business logic confuse large language models. Unlike greenfield projects where every file is freshly written, legacy codebases contain historical decisions that have never been formally captured. An AI assistant trained on GitHub snapshots cannot infer that “contact” means something different from “user” in your domain unless you explicitly define the terms.

One incident crystallized the problem. Asked to add a contacts table, the AI dutifully created a new entity with first_name, last_name, email, and phone_number—fields already present in the users table. Without foreign-key guidance, it duplicated data rather than linking tables. The mistake wasn’t stupidity; it was a lack of context. Once the team framed the issue as a knowledge gap instead of a tool failure, they could design a solution.

Building an AI onboarding kit

The solution borrows from software-engineering best practices but repurposes them for machine collaborators. The team created three artifacts that act like a new hire’s orientation binder: a context glossary, a decision ledger, and a high-level map of the system.

1. Architectural Decision Records (ADRs) with bite-sized rules

A new docs/adrs/ folder hosts numbered documents that answer common questions the AI might ask during coding tasks:

How do we add a new database table without duplicating columns?
Where should API routes for financial integrations live?
What naming convention do we use for query helpers?

Each ADR includes a concrete rule that overrides general best-practice advice. For example, ADR-001 states: “Before creating a new table column, query the existing users table. If the field already exists, add a foreign key instead of duplicating data; if the field belongs conceptually to users, propose a migration rather than a new column.”

2. A canonical context glossary and system plot

The team authored two files at the root of the docs directory:

context.md – a glossary that defines fintech-specific terms such as “verified_user,” “risk_profile,” and “transaction_journal,” explaining how each differs from generic alternatives.
plot.md – a one-page narrative that describes the product, its data flows, and the high-level boundaries between frontends, backend services, and external integrations.

Both files open with a bold directive: “The docs directory is the single source of truth. Follow its rules in order; do not skip steps.”

3. Mandatory test coverage for every new API route

Before any pull request can merge, a GitHub Action runs a full test suite that includes:

Unit tests for every new endpoint
Schema validation against the current production database
Integration tests that simulate financial-data feeds

During one deployment cycle, the AI refactored a shared utility function used by twelve downstream routes. Without tests, the change would have silently broken eight integrations. The failing tests surfaced the issue immediately; the AI then generated a backward-compatible version that preserved existing behavior while adding the new feature. The guardrails transformed a potential outage into an autonomous fix.

Cultural shift: from skepticism to adoption

When the team first demoed the system, reactions ranged from skepticism to outright doubt. “AI has burned us before,” one engineer remarked. After watching the AI autonomously correct its own schema duplication and pass all tests, three teammates asked to replicate the setup in their own projects.

The lesson was clear: AI reliability scales with the quality of the onboarding. Blaming the tool for hallucinations is like blaming a trainee for not knowing company policy. The fix is not to demand flawless code out of the box, but to provide the same orientation that turns a new hire into a productive teammate.

Replicating the system in your codebase

Setting up the onboarding kit takes less than a day. Create these files at the root of your docs folder:

docs/
├── context.md        # Domain-specific definitions
├── plot.md           # High-level architecture map
└── adr/
    ├── 001-table-creation.md
    ├── 002-api-structure.md
    └── 003-query-patterns.md

Key practices to enforce:

Write a new ADR after every AI-induced mistake; turn failures into rules.
Keep the glossary and plot files authoritative—list them at the top of every prompt.
Require test coverage before merging; make test failures part of the AI’s feedback loop.
Review ADRs monthly to ensure they still reflect the current system.

The real goal: predictable, not perfect, AI

This system does not guarantee error-free output. What it delivers is consistency: smaller, catchable mistakes and clear traces back to the rule that was violated. When the AI slips, the team gains a new ADR that tightens the onboarding. Over months, the documentation becomes a living specification that both humans and machines consult before changing the codebase.

The fintech startup started using AI to move faster. They ended up building a knowledge system that moves both humans and machines forward together—one rule, one test, and one refactored utility function at a time.

AI summary

Üç yıllık karmaşık bir fintech projesi, AI destekli kodlama araçlarını kullanırken karşılaşılan halüsinasyonları nasıl durdurdu? Dokümantasyon, ADR ve testler sayesinde AI’nın tutarlılığı artırıldı.