Claude Code in Product Engineering: A Scalable AI Workflow Strategy

Most AI coding assistants operate in silos, offering suggestions within a single file or editor session. While this approach can boost individual productivity, it fails to address the complexities of real-world product engineering—where development spans multiple repositories, involves cross-team collaboration, and requires long-term knowledge retention.

To bridge this gap, one developer has designed and deployed a system that integrates Claude Code across the full software development lifecycle. After months of daily operation, the system demonstrates how AI can safely and effectively support complex engineering workflows without becoming a liability. Here’s how it works, why it’s structured this way, and the hard lessons learned from early failures.

The core philosophy can be summed up in a single statement: AI agents should focus on judgment calls, not mechanical tasks. This principle underpins every decision in the system’s design.

Why AI Agents Fail in Engineering Workflows

A common mistake in AI-driven development is overloading agents with both high-level reasoning and low-level execution. For example, an agent might be tasked with writing code, committing it to a branch, running tests, and pushing changes—all in one continuous loop. This conflation creates three major problems:

Token waste: Every API call, filesystem operation, or test execution consumes expensive LLM tokens, driving up costs and latency.
Unreliability: Mechanical steps are deterministic by nature. Offloading them to an agent introduces unnecessary failure points.
Lack of auditability: When an agent performs irreversible actions (like merging code), tracing decisions becomes difficult, increasing risk.

The solution is to separate judgment from execution. Mechanical operations should be handled by deterministic scripts, while agents are reserved for tasks that require nuanced decision-making.

The Six-Layer Architecture Behind the System

The architecture is designed to balance autonomy with control, ensuring AI augments rather than disrupts engineering workflows. The system is structured into six distinct layers, each with a specific role:

┌───────────────────────────────────────────────────────┐
│ 1. User Interface & Dashboard                        │
├───────────────────────────────────────────────────────┤
│ 2. Skill Command Routing                             │
├───────────────────────────────────────────────────────┤
│ 3. Orchestrator (Python-based workflow engine)      │
├───────────────────────────────────────────────────────┤
│ 4. Agent Layer (Claude Code with specialized sub-agents)│
├───────────────────────────────────────────────────────┤
│ 5. Persistent Knowledge Layer (SQLite, Markdown, ChromaDB)│
├───────────────────────────────────────────────────────┤
│ 6. External Integrations (Jira, GitLab, Confluence, K8s)│
└───────────────────────────────────────────────────────┘

Layers 1–3 handle deterministic workflows using Python scripts. These layers manage file operations, API calls, and test executions without involving the AI agent.
Layer 4 is where Claude Code operates. The agent is invoked only when judgment is required, such as writing code, evaluating review feedback, or resolving architectural trade-offs.
Layer 5 stores contextual knowledge—past decisions, ticket history, and architectural guidelines—in a persistent wiki and operational database. This ensures the system doesn’t re-derive the same context across sessions.
Layer 6 integrates with external tools like Jira for ticket management, GitLab for version control, and Kubernetes for deployment, keeping the system in sync with the broader engineering environment.

Specialized sub-agents operate within isolated contexts. For instance, a code review agent can analyze pull requests but lacks permissions to modify files directly. This containment reduces risk while enabling focused AI assistance.

Orchestrated vs. Agent-Native Skills: A Clear Divide

Not every task requires the full orchestration pipeline. The system classifies skills into two categories based on their need for external side effects:

Orchestrated skills involve multi-step workflows with potential for irreversible changes. Examples include:
Implementing a ticket that spans multiple files
Creating and pushing feature branches
Running integration tests and deploying code
Remediating review feedback that requires code edits

These tasks benefit from deterministic Python orchestrators that coordinate mechanical steps while leaving judgment to the agent.

Agent-native skills are reasoning-heavy but lack side effects. Examples include:
Debugging a service issue by analyzing logs
Classifying an unknown input format
Summarizing standup notes from raw data

For these tasks, the agent operates directly without an orchestrator, reducing overhead and complexity.

The decision to add orchestration should be deliberate. If the mechanical steps are simple (e.g., running a linter), the agent can handle them. But if the steps are complex or risky, an orchestrator ensures reliability and auditability.

From Ticket to Merge Request: A Step-by-Step Lifecycle

To illustrate how the system works in practice, let’s trace a single ticket from creation to deployment:

Trigger: A developer initiates the process by issuing a command like /ticket 12345 in the CLI or dashboard.

Phase 1: Context Assembly (Orchestrator)

The Python orchestrator fetches the Jira ticket details.
It searches the persistent wiki for past architectural decisions related to the feature.
A new Git worktree and feature branch are created.
The orchestrator compiles an implementation brief summarizing requirements, constraints, and relevant context.
This brief is delivered to the agent as a structured JSON payload.

Phase 2: Implementation (Agent)

Claude Code receives the brief and the relevant codebase files.
It drafts the necessary code changes, adhering to project standards.
The agent may iterate multiple times, refining its approach based on feedback from tests or reviews.

Phase 3: Validation (Orchestrator + Review Agent)

The orchestrator runs automated checks: tests, linters, and formatters.
If failures occur, the issue is looped back to the agent for remediation (up to three attempts).
A specialized code review agent analyzes the changes, flagging potential issues or suggesting improvements.
Blockers or critical feedback are escalated back to the primary agent for resolution.

Phase 4: Proposal and Deployment (Human-in-the-Loop)

The orchestrator generates a structured proposal summarizing the changes, test results, and review feedback.
The proposal appears in the dashboard for human review.
Upon approval, the orchestrator executes the final steps: pushing the code, creating a merge request, and logging the activity to an audit trail.

This human-in-the-loop approach ensures that AI augments—but never replaces—human oversight, especially for actions with permanent consequences.

Key Takeaways and Lessons Learned

Building this system has revealed several critical insights:

Judgment-first design reduces costs: By reserving AI for high-value decisions, token usage drops significantly while maintaining performance.
Persistence beats recency: Storing engineering context in a wiki and database prevents redundant research and ensures consistency across sessions.
Safety through isolation: Specialized agents with scoped permissions minimize the blast radius of AI errors.
Over-orchestration backfires: Adding orchestration layers where they’re unnecessary increases maintenance burden and failure modes.

The system isn’t just a technical experiment—it’s a blueprint for integrating AI into product engineering at scale. By treating the AI agent as a consultant rather than a worker, teams can leverage its strengths without compromising reliability. As AI tools evolve, architectures like this one will become essential for teams aiming to balance innovation with stability.

The next frontier? Extending this model to real-time collaboration, where multiple agents and humans work in tandem across distributed teams. The foundation is already here—it’s about scaling it thoughtfully.

AI summary

Yapay zeka destekli ürün mühendisliği, yazılım geliştirme sürecini hızlandırıyor ve verimliliği artırıyor. Ürün mühendisliği projelerinizi yönetmek için güçlü bir sistem.

Claude Code in Product Engineering: A Scalable AI Workflow Strategy

Why AI Agents Fail in Engineering Workflows

The Six-Layer Architecture Behind the System

Orchestrated vs. Agent-Native Skills: A Clear Divide

From Ticket to Merge Request: A Step-by-Step Lifecycle

Key Takeaways and Lessons Learned

Comments

2026 Travel Costs: Where $20 Per Day Beats $170 for Beach Vacations

Why Breaking Up Your App into Microservices Boosts Scalability

How Test-Driven Development Turns Fear of Bugs Into Confidence