iToverDose/Software· 7 MAY 2026 · 16:08

Open-source adversarial code review system for AI agents now available

A new open-source solution introduces a four-agent adversarial review system that lets AI coding agents critique each other’s work programmatically. Built on heym and exposed as an MCP server, it provides structured second opinions for autonomous code generation workflows.

DEV Community3 min read0 Comments

The rise of AI coding assistants has transformed how developers review code—but who reviews the reviewers? A new open-source project introduces a four-agent adversarial review system designed to provide unbiased, structured critiques of AI-generated code.

This workflow, built on the heym platform, exposes itself as an MCP server, allowing any coding agent—whether Cursor, Claude Code, or Codex—to request a peer review before finalizing its output. The system is MIT-licensed and available on GitHub, offering a scalable way to automate critical oversight in autonomous development pipelines.

Git serves as the backbone of agent workflows

Modern AI coding tools now rely on Git as their primary control loop. Systems like Andrej Karpathy’s AutoResearch commit changes, roll back failures, and generate pull requests just as human developers would. The GitHub Action workflows behind tools like Claude Code operate on the same principle: an agent produces a deliverable indistinguishable from human work—a branch, a diff, or a PR—ready for review.

This evolution didn’t require reinventing the wheel. Git was already the preferred artifact for evaluating code quality. AI agents simply adopted a 20-year-old workflow already familiar to engineering teams. The challenge now lies in ensuring those agents can critically evaluate each other’s output at scale.

Peer review by agents, not just for agents

Today, when an AI agent generates a pull request, the final review still falls to a human. That works for low-volume workflows but breaks down at agent speed. The logical next step is having agents review each other—but most implementations use a single model pretending to be multiple reviewers, risking self-approval and theatrical concerns.

Existing tools like CodeRabbit, Greptile, and Qodo specialize in reviewing human pull requests on GitHub. They operate as vertical SaaS bots, not as programmable primitives that other agents can invoke. The solution introduced this week fills that gap: an open, adversarial code review system designed specifically as a callable layer beneath those tools.

A workflow that structurally resists bias

The system consists of four distinct agents, each with a specialized role and locked to a specific cognitive scaffold via Ejentum’s harness API. One architect agent coordinates the process but cannot invent concerns; it only synthesizes evidence from specialists. The other three agents—each running different base models (Anthropic, Google, Alibaba)—focus on reasoning, implementation, and anti-deception checks.

  • The reasoner decomposes potential failure angles using a reasoning harness.
  • The implementer writes verification tests to validate the proposed changes.
  • The reviewer rejects superficial framing and demands concrete evidence for approval.

Cross-lab diversity reduces correlated failure modes, though biases in pretraining and synthesis remain. The architect produces a structured verdict with severity ratings, ensuring no single model can rubber-stamp its own work.

For example, when tested on a "quick refactor" that swapped exception handling for default returns, the implementer caught the behavioral change by writing a test for the original exception. The reviewer flagged the misleading framing, and the architect issued a request_changes verdict with high severity—none of it generated by the architect itself.

heym turns workflows into callable primitives

heym functions similarly to n8n but with first-class support for multi-agent orchestration. Its Docker-based setup allows self-hosting, and each workflow can be exposed as its own MCP server. That means the four-agent review system isn’t just a static template—it’s a programmable subroutine that any AI coding agent can invoke mid-task.

Developers can integrate it into Cursor, Claude Code, AutoResearch loops, or custom Python pipelines. After an agent completes its work, it calls the review team, receives a structured verdict, and acts on the feedback. This layer—the open primitive beneath vertical SaaS bots—was missing until now.

Open-source availability and next steps

The complete workflow, system prompts, verification tests, and setup guide are available on GitHub under the MIT license. A one-click import option is also available on the heym template marketplace. The project invites contributions, critiques, and real-world testing to refine the adversarial review process further.

As AI agents assume greater autonomy in code generation, the need for robust, unbiased review mechanisms grows. This open system offers a foundation—not a perfect solution—but one that acknowledges its own limitations while providing a scalable way forward.

AI summary

Dört farklı ajanla çalışan açık kaynaklı kod inceleme sistemi sayesinde yapay zeka araçlarının çıktıları ikinci bir görüşle değerlendirilebilir hale geliyor. heym tabanlı çözümün detayları ve nasıl kullanılacağı.

Comments

00
LEAVE A COMMENT
ID #GZIS6D

0 / 1200 CHARACTERS

Human check

9 + 2 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.