Cut AI costs 41% with a custom model router in TypeScript

Building AI-powered tools is easier than ever, but the convenience comes with hidden costs. Many developers rely on integrated AI assistants or wrapper services that silently inflate bills by sending unnecessary tokens to premium models. One developer decided to take control—and saved 41% on their monthly AI spending by replacing a default workflow with a lightweight, intent-based router.

The hidden markup in AI wrapper services

AI subscription wrappers—like Cursor, Copilot, and others—simplify workflows by handling model selection and context management behind the scenes. While this saves time, it often leads to overbilling for services you don’t actually need. For example, a simple typo fix or variable rename might be routed to a high-cost model instead of a lightweight alternative, just because the wrapper defaults to convenience over cost efficiency.

Audit logs from one developer showed a striking discrepancy: Cursor sent an average of 8,400 input tokens per session, while the same prompt sent directly to Anthropic averaged only 1,900 tokens. The extra 6,500 tokens were the wrapper’s own system prompts, indexing metadata, and scaffolding—not the user’s actual input. This orchestration tax compounds across teams and projects, making wrapper subscriptions far costlier than they appear.

A 200-line solution for smarter AI routing

Instead of accepting opaque pricing from wrappers, the developer built a custom TypeScript router that selects models based on intent, not habit. The router is intentionally concise—just 200 lines—yet flexible enough to handle a wide range of use cases. It uses pattern matching and simple rules to route prompts to the most cost-effective model available.

Here’s the core logic, stripped of boilerplate:

// router.ts
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";

type Intent = "trivial" | "code" | "plan" | "embed";

interface RouteRule {
  match: (prompt: string) => boolean;
  intent: Intent;
}

interface ModelConfig {
  provider: "anthropic" | "openai";
  model: string;
  maxTokens: number;
}

const ROUTES: Record<Intent, ModelConfig> = {
  trivial: {
    provider: "anthropic",
    model: "claude-haiku-4-5-20251001",
    maxTokens: 1024,
  },
  code: {
    provider: "anthropic",
    model: "claude-sonnet-4-6",
    maxTokens: 4096,
  },
  plan: {
    provider: "anthropic",
    model: "claude-opus-4-7",
    maxTokens: 8192,
  },
  embed: {
    provider: "openai",
    model: "gpt-5-mini",
    maxTokens: 512,
  },
};

const RULES: RouteRule[] = [
  {
    intent: "trivial",
    match: (p) => p.length < 200 && /\?$/.test(p.trim()),
  },
  {
    intent: "trivial",
    match: (p) => /^(what is|define|fix typo|rename)/i.test(p),
  },
  {
    intent: "plan",
    match: (p) => /(refactor|design|architect|migrate|plan)/i.test(p),
  },
  {
    intent: "code",
    match: (p) => /(```|function |class |const |let )/i.test(p),
  },
  {
    intent: "embed",
    match: (p) => p.startsWith("CLASSIFY:"),
  },
];

function pickIntent(prompt: string): Intent {
  for (const rule of RULES) {
    if (rule.match(prompt)) return rule.intent;
  }
  return "code";
}

// Example usage
const result = await route("rename this function from getUser to fetchUser");
console.log(result.model, result.costUsd.toFixed(5));
// Output: claude-haiku-4-5-20251001 0.00012

The router’s pricing structure is transparent and customizable. Developers define token-based rates in a simple lookup table, allowing the system to compute real-time costs before sending a prompt. For instance, trivial tasks like lookups or typo fixes are routed to claude-haiku-4-5, which costs $0.80 per million input tokens and $4.00 per million output tokens—far cheaper than routing them to claude-sonnet-4-6 at $3.00/$15.00 per million.

Intent-based routing outperforms default choices

The key insight is that most coding queries aren’t complex—they’re simple lookups disguised as questions. By analyzing prompts and classifying their intent, the router avoids unnecessary premium model usage. For example:

A prompt like "What does map do in JavaScript?" matches the trivial intent and is routed to Haiku.
A request for "Refactor the user service to use dependency injection" triggers the plan intent and uses Opus.
A prompt starting with CLASSIFY: is sent to gpt-5-mini for fast, low-cost embeddings.

Developers can override the default rules with a simple prefix, such as [force:opus] to force a specific model for edge cases. This keeps the core router lean while allowing fine-tuning for specialized workflows.

What’s next for AI cost optimization

The success of this approach highlights a growing trend: developers are moving from off-the-shelf AI tools to lightweight, cost-aware alternatives. By taking control of model selection, teams can reduce cloud bills without sacrificing performance. As AI models continue to diversify, the ability to route intelligently—based on intent, cost, and context—will become a core competency for engineering teams.

For those tired of opaque pricing and overbilling, the message is clear: a few hundred lines of code can unlock significant savings—and maybe even outpace the wrappers on their own turf.

AI summary

Yapay zeka router'ı oluşturarak masrafınızı azaltabilirsiniz. İşte 200 satırlık TypeScript kodu

Cut AI costs 41% with a custom model router in TypeScript

The hidden markup in AI wrapper services

A 200-line solution for smarter AI routing

Intent-based routing outperforms default choices

What’s next for AI cost optimization

Comments

2026 Travel Costs: Where $20 Per Day Beats $170 for Beach Vacations

Why Breaking Up Your App into Microservices Boosts Scalability

How Test-Driven Development Turns Fear of Bugs Into Confidence