CLI vs MCP: How token efficiency reshapes AI agent tooling

A recent experiment pitted Google Chrome’s native DevTools MCP server against a custom CLI wrapper to see how each method affects token usage and agent efficiency in GitHub Copilot CLI. The goal wasn’t to crown a winner but to highlight where the discovery and execution costs land—and how they scale when multiple tools are involved.

The experiment’s core question

Most AI agents rely on tools exposed via MCP (Model Context Protocol) servers, which load schemas into the model’s context upfront. While this automates tool discovery, it also introduces significant token overhead—sometimes thousands of tokens before the agent even begins work. The alternative? A CLI wrapper that lets agents discover and interact with tools the same way they would with any command-line utility.

In this test, the developer ran the same 9-step browser smoke test—a private Streamlit app—through two paths: direct Chrome DevTools MCP and a custom skill that wraps the MCP server using mcp2cli. Both paths used GitHub Copilot CLI with the gpt-5.3-codex-medium model, but the direct MCP path added roughly 5,000 upfront tokens compared to the CLI wrapper. That might not sound like much, but when stacked with additional MCP servers, the context bloat becomes hard to ignore.

Mode                | Blank total | MCP tools line | Difference vs CLI path
--------------------|-------------|----------------|------------------------
CLI skill path      | 19k         | 155            | baseline
Direct Chrome DevTools MCP | 24k         | 4.9k           | +5k

Why CLI wrappers cut down on wasted tokens

The mcp2cli README claims the wrapper can “Save 96–99% of the tokens wasted on tool schemas every turn.” While the claim sounds aggressive, the experiment proved the mechanics behind it. CLI tools don’t rely on JSON schemas or verbose tool descriptions; they expose their capabilities through command names, flags, and --help output—structures the model already understands from prior training.

For example, a developer choosing between the gh CLI and GitHub’s MCP server would likely prefer the former because the model already knows the CLI’s commands and arguments. There’s no need to waste tokens reintroducing a tool it’s already familiar with. The same logic applies here: wrapping an MCP server in a CLI surface lets the agent discover and use the tool with minimal context pollution.

The custom skill was even bootstrapped by an AI agent. Starting from just the GitHub repos for Chrome DevTools MCP and mcp2cli, the agent wrote the first version of the wrapper using documentation and runtime checks. For simple MCP servers with narrow workflows—like starting a session, navigating, inspecting state, and cleaning up—the docs were enough. More complex servers, however, might require running the server live during skill development so the agent can observe actual behavior instead of relying solely on static schema files.

Runtime results: speed vs. stability

The experiment ran three iterations for each path using identical prompts. The outcomes reveal trade-offs between speed, consistency, and token growth.

CLI skill path:
Attempt 1: 39k total context, 20.5k messages
Attempt 2: 37k total context, 18.1k messages, runtime 259 seconds, 9/9 steps passed after retry on flaky checkbox
Attempt 3: 38k total context, 18.9k messages, runtime 141 seconds, 9/9 steps passed; UID failure with label click recovery

Direct MCP path:
Attempt 1: 40k total context, 16.1k messages
Attempt 2: 62k total context, 38.7k messages, runtime ~101 seconds, fastest completed run
Attempt 3: 79k total context, 55.9k messages, runtime 241 seconds, 9/9 steps passed but with long waits; one delay exceeded 120 seconds

The fastest recorded run came from the direct MCP path, but the CLI wrapper delivered more stable message growth across attempts. The MCP’s third attempt strayed into unnecessary waits and reloads, inflating both context and runtime. The CLI skill, by contrast, showed consistent token usage and fewer erratic behaviors.

The hidden factor: agent behavior

Token usage and runtime aren’t just about the tool surface; they’re shaped by how the model navigates the task. A single stale UID, a misplaced wait loop, or an overzealous snapshot can skew results dramatically. Context engineering—tuning the tool surface—can only go so far when the agent’s random walk through the browser trace is the dominant variable.

For instance, one failed attempt in the CLI path required a retry on a flaky checkbox, yet the total context remained nearly flat. In contrast, the MCP’s third run wandered into long pauses, pushing the context from 16k to 56k messages. That volatility makes it hard to draw definitive rankings from such a small sample.

Looking ahead: CLI-first tooling for AI agents

This experiment underscores a growing trend: CLI-first tooling may offer a more scalable and predictable path for AI agents. By reducing upfront context bloat and letting models interact with tools the way humans do—through commands, flags, and help text—developers can avoid the token tax that comes with verbose MCP schemas.

Of course, not all tools lend themselves to CLI wrappers. Complex MCP servers with rich runtime APIs might still benefit from direct integration. But for simpler, well-scoped tasks, the efficiency gains are hard to ignore. As AI agents take on more workflows, the balance between discovery costs, runtime stability, and token efficiency will only grow in importance. The question isn’t just which tool is faster—it’s where the real work happens: in the tool’s schema or in the agent’s path through it.

AI summary

Chrome DevTools MCP ile CLI entegrasyonunun token tüketimini nasıl %96 azalttığını keşfedin. GitHub Copilot CLI deneyi ve mcp2cli aracının performans karşılaştırması.

CLI vs MCP: How token efficiency reshapes AI agent tooling

The experiment’s core question

Why CLI wrappers cut down on wasted tokens

Runtime results: speed vs. stability

The hidden factor: agent behavior

Looking ahead: CLI-first tooling for AI agents

Comments

Boost GitHub Copilot CLI with language servers for precise code insights

Build a Robust Express + TypeScript Backend with Zero Boilerplate

Why Linters Outperform AI for Clean, Reliable Code