AI agents hit reliability wall: Why 2.0 rebuilds are now essential

Enterprise AI agents are hitting a critical inflection point—not because of model limitations, but because workflows designed for short bursts are failing in long-running production environments. After an initial wave of rapid deployment, organizations are discovering that reliability, state management, and cost control matter more than raw performance. This realization is driving a surge in "version 2.0" rebuilds, where teams revisit hastily assembled agent architectures to address persistent failures, hidden costs, and unmanaged state.

According to Preeti Somal, Senior Vice President of Engineering at Temporal Technologies, this shift reflects a deeper engineering reckoning. "We see many customers coming to us after their first attempt," she noted during the latest AI Impact Series event in New York. "They moved fast, but they didn’t build the foundation. Now, when things crash, they’re forced to rebuild with reliability in mind."

The core challenge isn’t the agents themselves—it’s the workflows they power. AI agents today often execute multi-step processes that span LLMs, APIs, retrieval systems, and external tools over hours or days. A single failure can cascade into exponential costs, data loss, or service disruptions. Somal emphasized that these problems aren’t new; AI just magnifies them. "The patterns are familiar," she said. "But agentic AI supercharges their impact."

The hidden complexity of long-running workflows

Long-running AI workflows introduce engineering concerns that manifest only after deployment. Teams often assume agents will run to completion, but real-world systems face interruptions, timeouts, and dependency failures. "People write agents without considering what happens if the system crashes," Somal explained. "Do you restart the entire flow, or can you recover from where it left off?"

Cost is another hidden pain point. Restarting failed workflows triggers repeated inference calls, driving up token spend and latency. Somal compared this moment to the early days of cloud migration, when organizations "lifted and shifted" workloads without modernizing architectures. "They realized they were spending more on cloud without getting value," she said. "The same lesson applies to AI agents now."

State vs. memory: Why the distinction matters

As agents evolve beyond simple chatbots into autonomous business processes, two concepts—state and memory—become critical. While often conflated, they serve distinct roles:

State tracks execution progress: which steps have completed, where recovery should resume, and what actions remain pending.
Memory preserves contextual information carried across interactions, such as conversation history or user preferences.

Somal highlighted this distinction using a healthcare example with Abridge, where agent workflows process physician visits through multiple stages: audio transcription, summarization, model inference, and after-visit report generation. "It’s not a single call," she said. "It’s a sequence of coordinated steps, each dependent on the previous one."

In long-running workflows, losing state means starting over. Losing memory means repeating context. Neither is acceptable in enterprise systems where continuity and compliance are non-negotiable.

Building a deterministic spine for AI workflows

To address these challenges, Somal advocates for a "deterministic spine"—a reliable orchestration layer that surrounds probabilistic AI models. This spine ensures workflows persist despite failures, retries failed steps intelligently, and resumes from the point of interruption.

"The deterministic spine is the path you define," she explained. "It calls the brain—the LLM—but if the brain doesn’t respond, it retries. If the next step would fail, it picks up where the failure occurred."

In this model, the LLM acts as an unpredictable component, while the orchestration layer enforces consistency. This balance is essential for enterprise processes like procurement approvals, healthcare documentation, or regulatory compliance, where silent failures are unacceptable.

"Your priority is recovery," Somal said. "You don’t want to pay the token tax for a failure."

The cost of invisibility: Why observability is non-negotiable

As enterprises scale AI agents, hidden costs and operational blind spots emerge. Long-running workflows often involve dozens of model calls across APIs, retrieval systems, and third-party tools. Without visibility, teams struggle to trace where token spend accumulates or why a workflow stalled.

Orchestration platforms provide a "single pane of glass" to monitor each step in real time. Somal described this as a game-changer for cost control and debugging. "You gain visibility into the entire flow," she said. "You can trace failures, measure latency, and optimize token usage—before costs spiral out of control."

This transparency is becoming a competitive advantage. Teams that invest in observability can iterate faster, reduce waste, and ensure agents remain reliable under pressure. For others, the cost of invisibility may be the first sign that their AI strategy needs a rebuild.

Looking ahead, the next phase of enterprise AI won’t be defined by smarter models, but by more resilient architectures. As Somal put it: "The rush to deploy is over. Now, it’s about building systems that last."

AI summary

AI ajanlarının üretim ortamındaki güvenilirlik sorunları, şirketleri mimariyi yeniden düşünmeye zorluyor. Uzun süreli iş akışlarının yönetimi ve maliyet şeffaflığı, yeni nesil AI sistemlerinin temelini oluşturuyor.

AI agents hit reliability wall: Why 2.0 rebuilds are now essential

The hidden complexity of long-running workflows

State vs. memory: Why the distinction matters

Building a deterministic spine for AI workflows

The cost of invisibility: Why observability is non-negotiable

Comments

App Development in 2026: Trends, AI Impact, and Career Paths

New UK tool maps plug-in solar potential for every address

Secluso: Open-source home security with end-to-end encryption