A recent study from Stanford University has raised questions about whether enterprises are paying an unnecessary "AI swarm tax" by defaulting to multi-agent systems for complex reasoning tasks. The research suggests that single-agent models, when given adequate compute resources, frequently match or surpass the performance of elaborate multi-agent architectures—even in scenarios that traditionally favor collaborative approaches.
Multi-agent systems face scrutiny over compute efficiency
Industry trends have increasingly favored multi-agent AI frameworks, which distribute reasoning across multiple models operating in parallel or sequentially. These systems—often described as "planner agents," "debate swarms," or "role-playing" setups—break down problems by having individual agents process partial contexts before synthesizing answers through iterative communication.
While multi-agent systems have demonstrated impressive results in benchmarks, the Stanford team argues that many of these gains may stem from computational advantages rather than architectural superiority. The core issue lies in how performance comparisons are conducted. Multi-agent setups typically consume more test-time compute through extended reasoning traces, additional agent interactions, and longer token generation. This makes it difficult to determine whether observed improvements come from smarter design or simply from spending more resources.
Equalizing the playing field with strict token budgets
To address this imbalance, the researchers designed experiments that enforced a strict "thinking token" budget—a metric that counts only the tokens used for intermediate reasoning, excluding prompt tokens and final outputs. This approach ensures a fair comparison between single-agent systems (SAS) and multi-agent systems (MAS) by controlling for computational overhead.
The study focused on multi-hop reasoning tasks, which require connecting multiple pieces of information to arrive at answers. Under these controlled conditions, single-agent models consistently matched or outperformed their multi-agent counterparts. The researchers identified a particular inefficiency in SAS setups: some models prematurely halt their internal reasoning despite having unused compute budget. To mitigate this, they introduced SAS-L (Single-Agent System with Longer Thinking), a technique that restructures prompts to explicitly encourage models to fully utilize their reasoning capacity.
# Example SAS-L prompt modification
original_prompt = "Answer the question based on the given context."
sa_l_prompt = "First, identify any ambiguities in the question. Then, list possible interpretations of the context. Finally, evaluate each interpretation before selecting the most plausible answer."The experiments showed that by restructuring prompts in this way, single-agent systems could achieve higher accuracy while consuming fewer reasoning tokens. When paired with advanced models like Google's Gemini 2.5, the SAS-L approach delivered the strongest overall performance across tested scenarios.
The data loss paradox: why single agents often win
The researchers attribute the advantage of single-agent systems to Data Processing Inequality, a concept from information theory. This principle suggests that every time information is summarized, filtered, or passed between agents in a multi-agent system, there is a risk of losing critical context. Multi-agent frameworks inherently introduce communication bottlenecks where partial results are repeatedly distilled, which can exacerbate errors and reduce efficiency.
In contrast, a single agent operates within a continuous context, preserving the full richness of the input data throughout its reasoning process. This continuity allows the model to maintain coherence and avoid the compounding losses that plague multi-agent systems under fixed compute constraints.
However, the researchers note that multi-agent systems do have their place. They excel in scenarios where single agents struggle with degraded inputs—such as noisy data, excessively long prompts filled with distractors, or corrupted information. In these cases, the structured decomposition and verification processes of a multi-agent system can recover relevant information more reliably than a single agent working with a polluted context.
The overlooked costs of multi-agent orchestration
Beyond performance comparisons, the study highlights the hidden costs of multi-agent systems that enterprises often overlook. Each additional agent introduces:
- Increased communication overhead between components
- More intermediate text generation, consuming additional tokens
- Higher risk of error propagation through multiple processing stages
- Greater complexity in debugging and maintaining the system
As Dat Tran and Douwe Kiela, the paper's authors, noted: "What enterprises often underestimate is that orchestration is not free. Every additional agent adds layers of complexity that can erode the benefits of distributed reasoning."
Rethinking default architectures for AI reasoning
The findings suggest that engineering teams should reconsider their default approach to complex reasoning tasks. Rather than immediately adopting multi-agent systems, developers should first evaluate whether a single-agent model with an optimized thinking budget—such as SAS-L—can meet performance requirements. Multi-agent architectures should be reserved for specific edge cases where single agents demonstrably cannot handle the problem constraints.
This research underscores the importance of rigorous, apples-to-apples comparisons in AI system evaluation. As AI adoption accelerates across industries, the pressure to deploy cutting-edge architectures must be balanced with careful attention to computational efficiency and real-world performance.
AI summary
Stanford research reveals single-agent AI systems often deliver better reasoning accuracy than multi-agent setups when compute budgets are equalized. Discover why SAS-L outperforms MAS in complex tasks.



