iToverDose/Software· 5 MAY 2026 · 16:02

Build AI coding agents with per-stage LLM routing for pennies

An open-source coding agent routes each pipeline stage to the most cost-efficient LLM without sacrificing quality. See how one tool cuts cloud spend to fourteen cents per PR while maintaining precision.

DEV Community5 min read0 Comments

AI coding tools often lock users into a single model per subscription, leaving swappable LLMs underutilized. This rigidity forces teams to either overpay for premium models or settle for subpar performance across all workflow stages. A new open-source solution flips the script by dynamically routing each phase of the development pipeline to the most cost-effective LLM for the task at hand.

The project, named Anvil, redefines AI-driven coding by treating large language models as interchangeable tools rather than fixed subscriptions. Instead of forcing developers to choose one vendor’s model for an entire feature request, Anvil assigns different LLMs to each step—from planning to testing—based on the specific capability required. The result? Faster, cheaper, and higher-quality code delivery without vendor lock-in.

One pipeline, multiple models: the magic of per-stage routing

Anvil’s standout feature is its per-stage model routing, which divides a single feature request into distinct pipeline stages, each handled by the most suitable LLM. For example, a recent pipeline run demonstrates how this works in practice:

clarify → Ollama qwen3:14b (local) ~ $0.00
plan → Claude Sonnet 4.6 (deep analysis) ~ $0.05
build → Ollama qwen3:14b (local) ~ $0.00
test → Ollama qwen3:14b (local) ~ $0.00
validate → Claude Haiku 4.5 (cheap, fast) ~ $0.01
review → Claude Sonnet 4.6 (judgment) ~ $0.08
ship → Ollama qwen3:14b (local git ops) ~ $0.00
──────────
~ $0.14 total

In this run, local models handled the bulk of the work—clarification, building, testing, and shipping—while premium models were reserved for planning and final review, where their advanced reasoning capabilities delivered measurable value. The entire process cost just fourteen cents in cloud spend, proving that smart model allocation can slash expenses without compromising outcomes.

Configuration-driven flexibility: no hardcoded logic

Unlike proprietary tools, Anvil’s routing isn’t baked into the codebase. Instead, it’s defined in a simple YAML file, allowing users to tailor the pipeline to their needs:

# ~/.anvil/stage-policy.yaml
stages:
  clarify:
    capability: reasoning
    complexity: S
    prefer: [local, cheap, premium]
  plan:
    capability: reasoning
    complexity: L
    prefer: [premium]
  build:
    capability: code
    complexity: M
    prefer: [local, cheap, premium]
  review:
    capability: reasoning
    complexity: L
    prefer: [premium]

Each stage declares:

  • The required capability (e.g., code generation or reasoning).
  • The complexity level (S, M, L for small, medium, large).
  • A preference order for model tiers (local, cheap, premium).

Anvil’s resolver then scans a companion file, ~/.anvil/models.yaml, to select the cheapest model that meets the stage’s requirements. This setup ensures that high-cost models are only used when absolutely necessary, while cheaper alternatives handle routine tasks.

Eight providers, zero vendor SDKs: true agnosticism

Anvil ships with adapters for eight major LLM providers, including:

  • Claude
  • OpenAI
  • Gemini
  • OpenRouter
  • OpenCode
  • Ollama
  • Gemini CLI
  • Google ADK

Critically, every adapter is implemented from scratch using vanilla JavaScript fetch() calls. There are no dependencies on vendor SDKs like @anthropic-ai/sdk or openai, nor any reliance on frameworks like LangChain or Vercel AI SDK. This design choice guarantees two key benefits:

  • Future-proofing: If a provider drops a model or changes pricing, your pipeline continues to work.
  • Cost transparency: Each adapter reports usage costs in real time, with no hidden fees or estimates.

When a model hits a rate limit mid-run, Anvil automatically falls back to the next available option in the same tier—whether from the same provider or a different one. This resilience keeps pipelines running smoothly, even in the face of provider constraints.

What’s included in v0.1.0: a pipeline built for efficiency

The first stable release, v0.1.0, packs a suite of features designed to maximize cost efficiency and output quality. Here’s what’s inside:

A nine-stage pipeline runner

Anvil’s pipeline breaks down feature requests into nine discrete stages, each executed by the most cost-effective model. Short, focused calls reduce token waste, while fallback mechanisms ensure that a single provider’s failure doesn’t derail the entire process. This granularity also allows teams to fine-tune model selection per stage, balancing speed and accuracy.

Hybrid retrieval for sharper context

To minimize reliance on premium models, Anvil employs a hybrid retrieval system that combines:

  • Vector search
  • BM25 (a robust keyword-based retrieval method)
  • Project graph retrieval (for code-aware searches)
  • Cross-encoder reranking
  • AST chunking via tree-sitter

By delivering precise context to models, even inexpensive ones can perform tasks that would otherwise demand frontier-tier LLMs. For instance, the build stage rarely requires a premium model because it already has access to the most relevant code snippets.

Long-term memory with smart pruning

Anvil includes a long-term memory system that tracks code facts and detects drift. Once an agent resolves an issue, the solution is stored and reused, eliminating redundant token consumption. Stale memories are automatically pruned, keeping the knowledge base fresh without manual intervention.

Convention engine: rules over LLM calls

Recurring review issues—like formatting mistakes or missing tests—can be encoded into deterministic rules via Anvil’s convention engine. After a pattern is flagged twice, the rule catches it during the linting phase instead of the review phase, saving LLM tokens and reducing latency.

Plan validator: catching errors before they happen

Anvil’s plan validator scans proposed solutions for common pitfalls, including:

  • Missing tests
  • Incorrect stage routing
  • Undocumented rollback strategies

By addressing issues upfront, teams avoid costly fixes later in the pipeline.

Multi-pass PR review for thoroughness

The PR review system enhances quality with:

  • Evidence gating (ensuring only relevant changes are reviewed)
  • Scope matching (preventing over-scoping)
  • Knowledge-base context (providing background on past decisions)
  • Dismissal filtering (suppressing false positives)

Again, premium models are reserved for stages where their impact justifies the cost.

Real-time cost tracking with OpenTelemetry

Every LLM call logs actual usage costs via a vendored LiteLLM pricing table, ensuring no surprises in the cloud bill. Anvil’s MIT license guarantees transparency, while its local-first design means no telemetry is sent to external servers.

Getting started: install and deploy in minutes

Setting up Anvil is straightforward:

npm install -g @esankhan3/anvil-cli
anvil init
anvil dashboard

The CLI bundles a React-based dashboard and WebSocket backend into a single Node.js process. From the dashboard, users can:

  • Launch pipelines
  • Review run history
  • Inspect the knowledge graph
  • Monitor memory usage
  • Configure provider keys

For offline-capable setups, Anvil supports Ollama out of the box:

brew install ollama
ollama pull qwen3:14b

Alternatively, OpenCode Zen subscribers can replace local models entirely with hosted open-coding solutions, removing the need for a GPU or local hardware.

What’s next: intentional trade-offs in v0.1.0

The initial release makes deliberate omissions to prioritize flexibility and transparency:

  • No hosted plan: Anvil is fully self-hosted, with no SaaS dependencies. The team plans to keep it provider-agnostic to avoid conflicts of interest.
  • No vendor SDKs: Every adapter is hand-rolled to prevent lock-in to a single provider’s ecosystem.

These choices reflect a commitment to open infrastructure over commercial convenience. Future updates may explore additional features, but the core philosophy—provider-agnostic, cost-aware AI coding—remains unchanged.

As AI tools evolve, the ability to mix and match models without vendor constraints will separate the tools that scale from those that stagnate. Anvil offers a glimpse of that future today.

AI summary

Tek bir modele bağlı kalmak zorunda değilsiniz. Anvil, AI kodlama araçlarınızda her aşama için en uygun LLM’yi otomatik seçerek maliyetleri %90’a kadar düşürüyor.

Comments

00
LEAVE A COMMENT
ID #XVIAG8

0 / 1200 CHARACTERS

Human check

2 + 5 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.