Centralize AI coding quotas with a single monitoring script

Juggling multiple AI coding assistants often feels like navigating a maze blindfolded. You start a session with one model, only to hit an unexpected rate limit mid-task. After spending minutes checking each tool individually, you realize you need a better system. That frustration led me to build a single monitoring script that tracks quotas for Claude Code, Gemini CLI, and Codex from one command line interface.

The pain of fragmented quota tracking

Many developers rely on three popular AI coding assistants daily: Claude Code, Gemini CLI, and Codex. Each handles rate limits differently, and none provides clear real-time visibility into your remaining budget. The typical experience involves:

Claude Code silently locks you out when you reach your 5-hour rolling window or weekly allocation
Gemini CLI counts requests but offers no official quota dashboard outside its web interface
Codex buries its rate limits inside an interactive terminal UI that’s impossible to query programmatically

Without a unified view, you waste time switching between tools, guessing which model has remaining capacity, or waiting for resets you didn’t track.

Tracking Claude Code’s hidden usage metrics

Claude Code authenticates via OAuth, storing credentials in ~/.claude/.credentials.json. While Anthropic’s public API documentation is silent on quota details, their internal endpoints expose the data you need. The undocumented /api/oauth/usage endpoint provides both rolling window and weekly utilization percentages, along with reset timestamps.

Here’s how to query the endpoint directly:

TOKEN=$(jq -r '.claudeAiOauth.accessToken // empty' "$HOME/.claude/.credentials.json")
curl -s --max-time 10 \
  -H "Authorization: Bearer $TOKEN" \
  -H "anthropic-beta: oauth-2025-04-20" \

The response includes structured data for both time windows:

{
  "five_hour": {
    "utilization": 0.42,
    "resets_at": "2026-02-28T17:00:00Z"
  },
  "seven_day": {
    "utilization": 0.61,
    "resets_at": "2026-03-07T08:00:00Z"
  },
  "seven_day_sonnet": {
    "utilization": 0.35,
    "resets_at": "2026-03-07T08:00:00Z"
  }
}

Key discoveries from monitoring this endpoint for weeks:

The 5-hour window resets based on a rolling schedule, not daily at midnight
Weekly limits reset every Thursday at 8pm Pacific Time
The undocumented anthropic-beta header is required; omitting it returns 401 errors

Reliance on an undocumented endpoint introduces maintenance risk, as future API changes could break the collector silently.

Extracting Gemini CLI usage from file system artifacts

Google’s Gemini CLI lacks any official quota dashboard, leaving developers to reverse-engineer usage patterns from system files. The workaround involves parsing session artifacts stored in ~/.gemini/tmp/_/chats/ where each file represents one conversation session.

Free tier limits cap daily requests at 1,000, but these files only provide a lower-bound estimate since one session can contain multiple API calls. To approximate usage:

Count session files created today to estimate request volume
Parse each file’s messages array to sum tokens.total values for precise consumption data

A Python implementation might look like:

import json
import glob
import os
from datetime import datetime

base = os.path.expanduser("~/.gemini/tmp/_/chats")
files = glob.glob(os.path.join(base, "session-*.json"))
week = {}

today = datetime.now().strftime("%Y-%m-%d")

for f in files:
    file_date = os.path.basename(f)[8:18]
    with open(f) as fh:
        data = json.load(fh)
        file_tokens = sum(
            m.get("tokens", {}).get("total", 0) 
            for m in data.get("messages", [])
        )
    if file_date not in week:
        week[file_date] = {"sessions": 0, "tokens": 0}
    week[file_date]["sessions"] += 1
    week[file_date]["tokens"] += file_tokens

While this method tracks closely enough for warning thresholds, it won’t catch you at exactly 999 requests. Any changes to file structure or schema would require immediate updates to maintain accuracy.

Querying Codex rate limits through background server mode

OpenAI’s Codex presents similar challenges with its buried rate limit information. The interactive /status command only works within the REPL, making automation impossible. After failed attempts involving tmux session manipulation, I discovered the codex app-server subcommand.

This hidden feature exposes JSON-RPC endpoints over stdin/stdout, including account/rateLimits/read which returns structured quota data. Unlike the other tools, Codex provides officially documented endpoints, reducing future breakage risk.

To use it:

codex app-server --mode json-rpc

Then send requests to the server process:

{"jsonrpc":"2.0","method":"account/rateLimits/read","id":1}

The response includes tier-specific limits and remaining usage percentages, finally providing the real-time visibility developers need.

Building a unified monitoring dashboard

Combining these three approaches into a single script creates a powerful workflow tool. Running hourly via cron, the collector:

Queries each endpoint for current utilization
Calculates time remaining until reset
Generates a visual status line showing both rolling and weekly windows
Writes results to a shared JSON file that can feed a CLI status bar or desktop widget

Example output format:

Session: ███░░░⏐░░░░░░░░░ 10% (3h12m left)
Weekly: ████████░░⏐░░░░░░░ 44% (Thu Mar 05 8pm PT)

Filled blocks represent consumed usage, while the marker shows your position in the time window. When blocks outpace the marker, you’re burning budget faster than time is passing.

Future-proofing your quota monitoring

While undocumented endpoints provide immediate solutions, they carry long-term maintenance risks. The ideal approach would involve:

Official API endpoints for all three platforms
Consistent reset schedules across services
Real-time push notifications before hitting limits
Cross-platform quota sharing to maximize tool switching flexibility

Until these improvements arrive, developers must rely on creative workarounds like the monitoring system described here. The key is building flexibility into your solution so you can adapt when platforms change their internal systems.

Start small by implementing one endpoint at a time, then expand to cover all your tools. The investment in automation pays off immediately when you avoid the frustration of mid-task rate limit surprises.

AI summary

Stop guessing when your AI coding tools will hit rate limits. This script unifies quota tracking for Claude Code, Gemini CLI, and Codex into one dashboard to prevent workflow interruptions.

Centralize AI coding quotas with a single monitoring script

The pain of fragmented quota tracking

Tracking Claude Code’s hidden usage metrics

Extracting Gemini CLI usage from file system artifacts

Querying Codex rate limits through background server mode

Building a unified monitoring dashboard

Future-proofing your quota monitoring

Comments

Why AVL Trees Keep Search Operations Fast with Smart Rotations

Python Basics: How Conditions, Loops and Functions Drive AI Development

Why AWS IAM Permissions Trip Up Beginners — And How to Fix It