Long-running AI agents are shifting from experimental tools to practical infrastructure—but most orchestration systems aren’t equipped to handle them. While many agentic systems today operate for seconds or minutes, Moonshot AI’s latest model, Kimi K2.6, demonstrates that agents can now function autonomously for hours or even days, performing tasks like monitoring, incident response, and complex engineering work without human intervention.
The Shift from Short-Lived Tasks to Continuous Execution
Most orchestration frameworks were designed for agents that complete a task and terminate within a narrow time window. Providers like Anthropic’s Claude Code and OpenAI’s Codex introduced early support for longer-running agents through mechanisms such as multi-session tasks and background execution. Yet even these approaches often assume agents remain within predefined, bounded workflows—an assumption that breaks down when agents operate continuously.
Kimi K2.6 challenges this paradigm by optimizing for stateful, long-horizon execution. Moonshot AI reports internal deployments where agents ran for 13 hours, iterating through 12 optimization strategies and executing over 1,000 tool calls to modify 4,000 lines of code. In one case, an agent operated autonomously for five consecutive days, handling monitoring, incident response, and system operations without human oversight.
Why Traditional Orchestration Struggles with Stateful Agents
The core issue lies in state management. Long-running agents must maintain context across dynamic environments, calling different tools, APIs, and databases as their tasks evolve. Most current agents—even those that execute for short bursts—perform this kind of tool switching, but within tightly controlled workflows that last at most a minute. Once an agent’s runtime extends to hours or days, the complexity explodes.
Orchestration frameworks that rely on static role definitions or pre-programmed workflows quickly reveal their limitations. For example:
- - They lack mechanisms to handle real-time state changes without manual intervention.
- - Recovery from failures becomes ambiguous, with no clear rollback strategies for partially completed tasks.
- - Dynamic plan adjustments, where an agent iteratively refines its approach, fall outside the scope of traditional frameworks.
Maxim Saplin, a practitioner experimenting with long-horizon agents, highlighted this fragility in a recent blog post:
“Subagents aren’t inherently useless, but orchestration remains brittle. Right now, it’s more of a product and training challenge than something solvable with better prompts.”
Enterprise Risks Outpace Governance Capabilities
The gap between agent capabilities and governance tools is widening. Mark Lambert, CPO at ArmorCode, warns that agentic systems can now generate code and system changes faster than most organizations can review or remediate.
“This demands more than additional scanning. Organizations need stronger AI governance frameworks that provide context, prioritization, and accountability to manage risks before they accumulate into exposure.”
Kunal Anand, CPO at F5, frames the shift as a fundamental architectural overhaul. He compares it to transitions from scripts to services to containers—each time introducing new categories that require rethinking infrastructure design.
“We’re moving to agents as persistent infrastructure, creating entirely new categories like agent runtime, gateway, identity providers, and mesh. The API gateway model is evolving to understand goals and workflows, not just endpoints.”
Beyond Benchmarks: Real-World Deployments of Kimi K2.6
Moonshot AI positions K2.6 as a model built for tasks that traditionally demand weeks or months of human effort. In one internal test, the model autonomously developed a full SysY compiler from scratch in just 10 hours—a feat the company equates to the work of four engineers over two months. The resulting compiler passed all 140 functional tests without human input.
Another deployment involved overhauling an eight-year-old open-source financial matching engine. The agent spent 13 hours executing 12 optimization strategies, coordinating over 1,000 tool calls to modify 4,000 lines of code with precision. These examples underscore how long-horizon agents are transitioning from theoretical constructs to practical tools—provided they can be orchestrated effectively.
The Path Forward for Agentic Workflows
The rise of long-running agents signals a new era for AI-driven infrastructure, but it also exposes a critical imbalance. Model capabilities are advancing faster than the tools needed to manage them. Enterprises exploring agentic systems must prioritize robust orchestration frameworks that can handle stateful execution, real-time adaptation, and clear governance. Without these, even the most capable models risk becoming unmanageable liabilities rather than transformative assets.
AI summary
Moonshot AI’s Kimi K2.6 pushes long-running AI agents beyond minutes to days, exposing critical gaps in orchestration frameworks and governance tools.


