AI agents can coordinate without a central controller, cutting costs by 50%

A long-held assumption in AI development—that multi-agent systems require a central orchestrator to manage tasks—may be unnecessarily inflating costs and slowing down workflows. Researchers at Stanford have challenged this model with a new framework called DeLM, short for decentralized language model, which allows AI agents to coordinate directly without a single point of control.

The innovation hinges on a shared knowledge base that acts as a "common communication substrate," enabling agents to build on verified progress without constantly routing updates through a central controller. According to the framework’s co-developers, Yuzhen Mao and Azalia Mirhoseini, this setup allows agents to "avoid repeated failures, preserve constraints, and recover detailed evidence only when needed."

Why traditional multi-agent systems fall short

Most AI systems today rely on a hierarchical structure where a central controller breaks tasks into subtasks, assigns them to sub-agents, and then merges their responses before proceeding. While this approach scales reasoning, it introduces bottlenecks. Every partial finding, failure, or update must be reported back to the main agent, which then decides what to share and how to redistribute it.

As the number of subtasks grows, this controller becomes a communication and integration bottleneck. The main agent may also "dilute, omit, or distort" useful information, leading to lost progress or redundant work. In long-context reasoning scenarios, the controller groups related data into "evidence clusters" before knowing what’s relevant, forcing sub-agents to repeatedly seek clarification.

This back-and-forth slows coordination, increases latency, and limits scalability. The Stanford team argues that a single overloaded controller restricts adaptability, making the entire system less efficient.

How DeLM redefines agentic coordination

DeLM replaces the central controller with a decentralized architecture built on three pillars: parallel agents, a shared context, and a dynamic task queue.

The shared context functions as a curated repository of "gists"—concise, verified summaries of findings, partial progress, or failures. These gists include pointers to detailed evidence that agents can reference as needed. The task queue, meanwhile, is a pool of pending subtasks that agents can claim independently.

Here’s how the pipeline works:

Initialization: Inputs are divided into work units and added to a queue.

Parallel execution: Agents work concurrently, pulling tasks and reading the shared context to inform their decisions.

Compression and verification: Results are distilled into reusable gists, which are checked against supporting evidence. Only verified gists are shared.

Additional work (if needed): When the queue empties, the last agent to respond reviews the shared context to determine if further steps are necessary.

Final output: The last agent confirms completion and returns the final answer.

This design allows agents to exchange progress asynchronously, scale adaptively with task volume, and avoid redundant exploration. "Agents write compact, verified updates into a shared context that later agents can read directly," the researchers explain.

Real-world performance and applications

DeLM’s decentralized approach shines in scenarios where parallel reasoning and shared knowledge are critical. One key use case is software engineering test-time scaling, where models are given time to "think" and improve problem-solving. In concurrent debugging, for example, multiple agents can explore different hypotheses simultaneously while still sharing intermediate findings.

The framework also excels in long-context reasoning and multi-document question-answering. Agents can examine their own evidence clusters—such as collections of papers or code—while maintaining a global view of accumulated progress. This reduces the risk of missing relevant details or duplicating efforts.

Benchmark results underscore DeLM’s advantages. On SWE-bench Verified, which evaluates AI’s ability to solve real-world software engineering problems, DeLM outperformed the strongest baseline by 10.5% while cutting cost per task by roughly 50%. On LongBench‑v2 Multi‑Doc QA, which tests long-context reasoning, DeLM achieved the highest accuracy across four leading model families, including GPT‑5.4, Claude Sonnet, Gemini Flash, and DeepSeek‑V4‑Pro.

The framework’s efficiency stems from its ability to share failures and build on verified progress. As Mao noted in a recent post, this collaborative approach reduces redundant exploration and accelerates convergence toward correct solutions.

The future of decentralized AI agents

DeLM represents a shift toward more scalable, cost-effective AI systems that don’t rely on a single point of failure. By enabling agents to coordinate directly, the framework reduces inference costs and latency while improving accuracy. Its success in benchmarks suggests potential for broader applications, from autonomous research to complex decision-making.

As AI models grow more sophisticated, decentralized coordination could become a standard rather than an exception. Stanford’s work paves the way for systems that are not only smarter but also more efficient, aligning performance with practical constraints.

AI summary

Stanford'un geliştirdiği DeLM, çoklu AI ajan sistemlerinde merkezi bir orkestra gereksinimini ortadan kaldırarak hem maliyetleri yarıya indiriyor hem de koordinasyon gecikmelerini azaltıyor. Nasıl çalıştığını ve performansını keşfedin.

AI agents can coordinate without a central controller, cutting costs by 50%

Why traditional multi-agent systems fall short

How DeLM redefines agentic coordination

Real-world performance and applications

The future of decentralized AI agents

Comments

Krea 2 Raw and Turbo: Faster AI image generation with customizable outputs

Anthropic’s Claude Tag redefines AI teammates in Slack with autonomy

Why AI pipelines fail and how resilient data delivery fixes them