Most developers rely on instinct when judging AI coding tools’ impact on their speed. That instinct, a new study shows, is frequently wrong.
In July 2025, the nonprofit METR published findings from a meticulously designed randomized controlled trial—the kind of rigorous research typically reserved for pharmaceutical trials—that directly measured the effects of AI coding assistants on developer productivity. The results challenged widespread assumptions about AI’s benefits.
What the study discovered about AI’s true impact
METR’s research revealed that developers using state-of-the-art AI coding tools—specifically Cursor Pro with Claude 3.5 and 3.7 Sonnet—took 19% longer to complete assigned tasks compared to those working without AI assistance. Yet, the developers themselves predicted before the trial that AI would improve their speed by 24%. After reviewing their own performance data, they still insisted they felt 20% faster, despite the objective evidence to the contrary.
This disconnect between perceived and actual productivity wasn’t just statistically significant—it was consistent across all participants. The study assigned 246 real-world tasks—from bug fixes to new features—across familiar open-source repositories, randomly assigning each task to either AI-assisted or non-AI conditions. The randomness ensured fairness, eliminating biases like developers choosing simpler tasks when using AI or working harder with tools they preferred.
The hidden costs behind AI’s slower pace
While the study didn’t fully explain why AI slowed developers, several patterns emerged that resonate with anyone who has integrated AI coding tools into their workflow.
Integration overhead disrupts workflow
When developers write code manually, their thinking and typing operate in sync. Introducing AI disrupts this harmony. The process requires explaining the problem, evaluating AI-generated suggestions, identifying errors, and refining outputs. For simple tasks like generating a function or test case, the AI’s speed can offset these inefficiencies. But for complex, context-rich work, the back-and-forth often consumes more time than it saves.
Debugging becomes a guessing game
Developers who write their own code develop intuitive theories about potential bugs—they know their own thought process, making errors easier to trace. When AI produces buggy code, those theories disappear. Developers must reverse-engineer code they didn’t write, deciphering logic that stems from probabilistic decision-making rather than deliberate design. According to a 2025 Stack Overflow survey, 45% of developers reported that debugging AI-generated code is significantly more time-consuming, a statistic that cuts across experience levels and tool preferences.
Perception of productivity misaligns with reality
The most striking finding wasn’t the performance data itself but the cognitive dissonance that followed. Participants felt more productive, enjoyed the process more, and expressed willingness to continue using AI tools—despite completing tasks more slowly. One plausible explanation is that AI shifts the subjective experience of work. Typing less feels like doing more. Having a constant stream of suggestions creates the illusion of progress, even when the underlying task takes longer. The cognitive load transitions from creation to orchestration, and orchestration, while mentally taxing, often feels lighter than deep concentration.
Counterexamples reveal broader truths about AI adoption
Dismissing AI coding tools entirely based on METR’s findings would be misleading. Real-world deployments at major companies tell a different story.
Spotify’s experience illustrates how AI’s impact varies by workflow. In December 2025, co-CEO Gustav Söderström reported that senior engineers had shifted from writing code to supervising AI-generated output using Claude Code with Opus 4.5. Spotify also deployed an internal agent called Honk, built on MCP, to autonomously transform code across repositories. This wasn’t a marginal speed improvement—it represented a fundamental shift in how work gets done.
Faros AI, a developer metrics firm, published a counter-study in February 2026 that examined team-level productivity rather than isolated tasks. Their data showed teams with high AI adoption handled 47% more pull requests daily and completed 21% more tasks overall. The gains didn’t come from individual tasks becoming faster. Instead, AI enabled developers to manage multiple workstreams simultaneously, effectively expanding capacity without increasing headcount.
Reconciling conflicting data with a practical framework
The contradiction between METR’s findings and real-world success stories isn’t a flaw in the studies—it reflects the complexity of measuring productivity in AI-assisted development. The key lies in understanding what each study actually measured.
Think of AI coding tools as a new kind of vehicle. A bicycle is faster than walking on flat terrain but slower than walking up steep stairs. Similarly, AI’s effectiveness depends entirely on context:
- Where AI genuinely accelerates work:
- Greenfield projects with minimal constraints
- Repetitive boilerplate tasks like CRUD endpoints or test scaffolding
- Exploring unfamiliar languages or frameworks where tribal knowledge is scarce
- Where AI introduces friction:
- Complex, tightly coupled systems where context is critical
- High-stakes debugging where assumptions about code logic are essential
- Work requiring deep domain expertise that AI hasn’t yet mastered
The takeaway isn’t that AI coding tools are inherently good or bad—it’s that their value depends on how and where they’re applied. Organizations that treat AI as a one-size-fits-all solution risk disappointment. Those that integrate it strategically, matching tools to appropriate tasks, stand to gain real advantages. The future of development won’t be AI replacing developers, but developers using AI to scale their impact beyond what was previously possible.
AI summary
Yapay zeka destekli araçların kodlama hızını artırdığını düşünüyorsanız, yeni bir araştırma sizi şaşırtabilir. Temmuz 2025'te yayınlanan METR çalışmasına göre, deneyimli geliştiriciler AI kullanırken %19 daha yavaş çalıştı. Peki neden böyle bir algı sapması ortaya çıkıyor?