Building AI-powered tools is easier than ever, but the convenience comes with hidden costs. Many developers rely on integrated AI assistants or wrapper services that silently inflate bills by sending unnecessary tokens to premium models. One developer decided to take control—and saved 41% on their monthly AI spending by replacing a default workflow with a lightweight, intent-based router.
The hidden markup in AI wrapper services
AI subscription wrappers—like Cursor, Copilot, and others—simplify workflows by handling model selection and context management behind the scenes. While this saves time, it often leads to overbilling for services you don’t actually need. For example, a simple typo fix or variable rename might be routed to a high-cost model instead of a lightweight alternative, just because the wrapper defaults to convenience over cost efficiency.
Audit logs from one developer showed a striking discrepancy: Cursor sent an average of 8,400 input tokens per session, while the same prompt sent directly to Anthropic averaged only 1,900 tokens. The extra 6,500 tokens were the wrapper’s own system prompts, indexing metadata, and scaffolding—not the user’s actual input. This orchestration tax compounds across teams and projects, making wrapper subscriptions far costlier than they appear.
A 200-line solution for smarter AI routing
Instead of accepting opaque pricing from wrappers, the developer built a custom TypeScript router that selects models based on intent, not habit. The router is intentionally concise—just 200 lines—yet flexible enough to handle a wide range of use cases. It uses pattern matching and simple rules to route prompts to the most cost-effective model available.
Here’s the core logic, stripped of boilerplate:
// router.ts
import Anthropic from "@anthropic-ai/sdk";
import OpenAI from "openai";
type Intent = "trivial" | "code" | "plan" | "embed";
interface RouteRule {
match: (prompt: string) => boolean;
intent: Intent;
}
interface ModelConfig {
provider: "anthropic" | "openai";
model: string;
maxTokens: number;
}
const ROUTES: Record<Intent, ModelConfig> = {
trivial: {
provider: "anthropic",
model: "claude-haiku-4-5-20251001",
maxTokens: 1024,
},
code: {
provider: "anthropic",
model: "claude-sonnet-4-6",
maxTokens: 4096,
},
plan: {
provider: "anthropic",
model: "claude-opus-4-7",
maxTokens: 8192,
},
embed: {
provider: "openai",
model: "gpt-5-mini",
maxTokens: 512,
},
};
const RULES: RouteRule[] = [
{
intent: "trivial",
match: (p) => p.length < 200 && /\?$/.test(p.trim()),
},
{
intent: "trivial",
match: (p) => /^(what is|define|fix typo|rename)/i.test(p),
},
{
intent: "plan",
match: (p) => /(refactor|design|architect|migrate|plan)/i.test(p),
},
{
intent: "code",
match: (p) => /(```|function |class |const |let )/i.test(p),
},
{
intent: "embed",
match: (p) => p.startsWith("CLASSIFY:"),
},
];
function pickIntent(prompt: string): Intent {
for (const rule of RULES) {
if (rule.match(prompt)) return rule.intent;
}
return "code";
}
// Example usage
const result = await route("rename this function from getUser to fetchUser");
console.log(result.model, result.costUsd.toFixed(5));
// Output: claude-haiku-4-5-20251001 0.00012The router’s pricing structure is transparent and customizable. Developers define token-based rates in a simple lookup table, allowing the system to compute real-time costs before sending a prompt. For instance, trivial tasks like lookups or typo fixes are routed to claude-haiku-4-5, which costs $0.80 per million input tokens and $4.00 per million output tokens—far cheaper than routing them to claude-sonnet-4-6 at $3.00/$15.00 per million.
Intent-based routing outperforms default choices
The key insight is that most coding queries aren’t complex—they’re simple lookups disguised as questions. By analyzing prompts and classifying their intent, the router avoids unnecessary premium model usage. For example:
- A prompt like "What does
mapdo in JavaScript?" matches thetrivialintent and is routed to Haiku. - A request for "Refactor the user service to use dependency injection" triggers the
planintent and uses Opus. - A prompt starting with
CLASSIFY:is sent togpt-5-minifor fast, low-cost embeddings.
Developers can override the default rules with a simple prefix, such as [force:opus] to force a specific model for edge cases. This keeps the core router lean while allowing fine-tuning for specialized workflows.
What’s next for AI cost optimization
The success of this approach highlights a growing trend: developers are moving from off-the-shelf AI tools to lightweight, cost-aware alternatives. By taking control of model selection, teams can reduce cloud bills without sacrificing performance. As AI models continue to diversify, the ability to route intelligently—based on intent, cost, and context—will become a core competency for engineering teams.
For those tired of opaque pricing and overbilling, the message is clear: a few hundred lines of code can unlock significant savings—and maybe even outpace the wrappers on their own turf.
AI summary
Yapay zeka router'ı oluşturarak masrafınızı azaltabilirsiniz. İşte 200 satırlık TypeScript kodu