Construction projects run on documents: blueprints, contracts, permits, and specifications. But most of these documents aren’t clean, digitized databases. They’re messy PDFs, handwritten notes, scanned drawings, and proprietary formats filled with industry jargon, abbreviations, and unspoken assumptions. General-purpose AI models struggle with this chaos. They’re trained for breadth, not depth, and can’t reliably interpret niche terms or follow domain-specific logic.
That’s why Trunk Tools, a construction project management company, built a specialized AI stack—what its team calls a "three-layer architecture"—to transform raw, unstructured data into actionable insights. The result? Review cycles shrank from months to single-digit days, costly field errors dropped, and autonomous agents gained the ability to reason across millions of pages of documentation, according to the company.
“We started with a clear goal: take data scattered across dispersed systems, clean it, structure it, map it to an ontology, and feed it into a knowledge graph before training AI models on top,” said Sarah Buchner, founder and CEO of Trunk Tools and a former carpenter. “This isn’t about making AI smarter in the abstract. It’s about making it reliable enough to handle the specific demands of construction workflows.”
Why general models fail on industry-specific data
Foundation models like large language models (LLMs) are designed to perform well across a wide range of tasks. But that breadth comes at a cost. They often lack the precision required for niche domains where terminology, context, and formatting are everything.
Kriti Faujdar, senior product manager in AI infrastructure and agentic systems, put it bluntly: “General-purpose LLMs are trained to be okay at everything, so they’re weak at anything niche.” She pointed to rare terms, domain-specific reasoning, and implicit knowledge that practitioners take for granted but models can’t infer. For example, a model might understand the concept of a "door" in a legal contract but fail to cite the exact article reference a lawyer needs.
Sébastien De Bollivier, a software developer specializing in AI systems, echoed this challenge. He noted that enterprise data—the most valuable kind—is often locked in internal systems, proprietary formats, or unstructured documents. Retrieval-augmented generation (RAG) can help surface relevant facts, but it doesn’t solve the underlying problem: the model still can’t reason effectively within the domain.
Faujdar emphasized that pre-training on domain-specific data is essential. Enterprises should fine-tune models on high-quality, task-specific examples and build their own evaluation benchmarks. “A few thousand real-world examples from practitioners beats millions of noisy, scraped ones every time,” she said.
She also recommended using mixture-of-experts (MoE) architectures to balance specialization and efficiency. Combining RAG with fine-tuning can work well: RAG handles long-tail factual retrieval while fine-tuning ensures the model uses the right vocabulary and follows domain-specific logic. De Bollivier advised a hybrid approach: a general-purpose model for reasoning and orchestration, paired with a fine-tuned, smaller model or dense retrieval system for domain-specific extraction.
“Don’t fine-tune to make the model ‘smarter’ about a domain,” he said. “Fine-tune to make it more reliable on the exact output format your workflow demands.”
Industries like construction, legal, and healthcare are seeing strong traction with these techniques because they combine high error stakes with standardized document formats—making the return on investment clear. However, Faujdar cautioned that specialized models can struggle outside their domain, often requiring retraining to remain useful beyond their original purpose.
Inside Trunk’s three-layer stack: From raw documents to agent-ready insights
Trunk’s approach starts with a harsh truth: most AI models can’t reliably interpret construction documents out of the box. Amrish Kapoor, Trunk’s CTO, explained why: “Transformers are probabilistic. When shown an image, they’ll say it’s probably a tree or probably a child playing near a tree. But in construction, a 2-millimeter symbol in one location means something entirely different than the same symbol elsewhere.”
This probabilistic nature makes general models ill-suited for high-precision tasks. Even worse, they’re constrained by limited context windows, struggling to maintain long-term project memory—something critical in construction, where projects span months or years.
To overcome these limitations, Trunk built a three-layer system:
- Perception layer: Extracts and interprets data from messy, unstructured documents like PDFs, scans, and drawings. This layer teaches AI to “read” the visual language of construction—where a door might be represented by a simple arc, not the word itself.
- Semantic/graph layer: Transforms extracted data into meaningful relationships. For instance, it connects a door symbol in a drawing to the specification sheet that governs it, the installation trade responsible, and the cost estimate tied to that component.
- Agents layer: Deploys AI agents to reason over this structured knowledge, triggering workflows based on context. Agents don’t just extract data—they act on it, flagging inconsistencies, generating comparisons, and producing narratives that humans can act on.
Buchner highlighted the difference between generic and domain-specific interpretation. “A model might tell you there’s a door on the wall. But a construction engineer needs to know: Does this door create a downstream problem?” she said. “A conflict caught in the design phase costs pennies to fix. The same issue in the field? That’s tens of thousands of dollars.”
The system begins by identifying document types—drawings, schedules, text paragraphs—and extracting information accordingly. That data is then transformed, augmented, and fed into agentic workflows. For example, an agent might compare two versions of an architectural bulletin, generate a visual overlay showing additions and removals, and write a plain-language summary of the changes. This doesn’t just save time—it reduces risk by ensuring everyone on the project understands critical updates.
A blueprint for vertical AI in high-stakes industries
Trunk’s experience offers a playbook for other industries drowning in unstructured, domain-specific data. The key isn’t just fine-tuning a model on industry jargon—it’s redesigning the entire stack to align with real-world workflows.
Buchner sees parallels in legal, healthcare, and manufacturing, where standardized documents and high error costs create clear ROI for domain-specific AI. But the challenge remains: building systems that don’t just understand data, but act on it reliably, safely, and at scale.
As AI adoption accelerates, companies that succeed won’t be the ones chasing the latest model—they’ll be the ones building the infrastructure to make those models useful in their specific contexts. For Trunk, that meant ditching general-purpose tools and building a stack tailored to the language, logic, and stakes of construction. The result speaks for itself: faster reviews, fewer errors, and a foundation for truly autonomous project management.
The next frontier? Scaling this approach beyond individual projects—toward enterprise-wide knowledge graphs that connect every document, decision, and dependency across a company’s entire portfolio of work.
AI summary
Trunk Tools, inşaat projelerinde belge inceleme sürelerini 60 günden 10 güne düşüren özel bir üç katmanlı AI sistemi geliştirdi. Bu sistem nasıl çalışıyor ve diğer sektörlere nasıl uyarlanabilir?

