Track AI Spending by Feature with Per-Request LLM Cost Attribution

When finance teams review the monthly bill from OpenAI or Anthropic, they often find more questions than answers. A single vendor invoice shows total spend across models, features, and customer accounts, but it reveals nothing about which team’s experimental feature drove costs, which customer’s workflows became unprofitable, or which prompt template suddenly tripled token usage. The gap between billing data and operational insight is where per-request LLM cost attribution becomes essential.

Until recently, AI spending was treated as a minor line item in broader cloud costs. Yet the FinOps Foundation’s 2025 State of FinOps report reveals a striking shift: 63% of organizations now actively manage AI spending—double the number from the previous year. This surge reflects a new reality: AI expenses are no longer a side effect of infrastructure. They are a primary cost center with distinct unit economics, latency constraints, and ownership requirements. Tracking spend at the request level turns AI from a monthly surprise into a daily operational metric.

Why Per-Request Attribution Transforms AI Spend Management

A monthly vendor bill aggregates costs across unrelated workloads. A support chatbot, an internal code assistant, and a customer-facing generation feature may all share the same OpenAI account, yet their token usage patterns, model choices, and business impact vary wildly. When costs are lumped together, averages obscure the drivers behind spikes. Per-request attribution replaces vague totals with granular answers:

Which feature triggered the cost spike this afternoon?
Which team owns the workflow that consumed 40% of this month’s budget?
Which customer’s requests are now generating negative margins?
Did a prompt change or a retry loop inflate expenses unexpectedly?

These questions require a denominator that matches the granularity of the spend. Without per-request tracking, teams are forced to reconstruct historical costs from incomplete logs, leading to delayed decisions and misallocated budgets.

The Essential Schema for AI Cost Tracking by Feature

You do not need a full-scale data platform to begin tracking LLM costs accurately. You do need a consistent event schema that captures both technical and business context. Each request record should include:

timestamp – When the request occurred, for trend analysis
provider and model – OpenAI’s GPT-5.4 mini or Anthropic’s Claude Sonnet 4
input_tokens – Raw tokens processed before generation
cached_input_tokens – Tokens retrieved from cache, if supported
output_tokens – Tokens generated in response
request_id – A unique identifier for traceability
team – The engineering or product team responsible
feature – The specific product or workflow using the model
customer_id or workspace_id – For margin analysis and segmentation
environment – Production, staging, or development
status – Success, timeout, retry, or fallback mode

This schema ensures every cost can be traced back to its source. Without team, ownership remains unclear. Without feature, you cannot assess product-level efficiency. Without customer_id, margin analysis becomes impossible. And without status, retries are logged as normal demand, masking inflated costs.

Calculating OpenAI Cost Attribution Per Request

The math is simple, but timing and accuracy are critical. The formula for computing request-level cost is:

request_cost = (
    (input_tokens / 1_000_000 * input_rate)
    + (cached_input_tokens / 1_000_000 * cached_input_rate)
    + (output_tokens / 1_000_000 * output_rate)
    + tool_or_search_fees
)

The challenge lies not in the calculation, but in storing the correct pricing rates for the exact model and version used on the day of the request. As of June 8, 2026, OpenAI’s current pricing for GPT-5.4 mini is:

Input tokens: $0.75 per 1M
Cached input tokens: $0.075 per 1M
Output tokens: $4.50 per 1M

For a typical request with 8,000 input tokens, 2,000 cached input tokens, and 1,200 output tokens, the cost breaks down as:

Input: 8,000 / 1,000,000 × 0.75 = $0.006
Cached input: 2,000 / 1,000,000 × 0.075 = $0.00015
Output: 1,200 / 1,000,000 × 4.50 = $0.0054
Total: $0.01155 per request

At 10,000 daily requests, this pattern generates approximately $115.50 per day, or $3,465 per month. When this cost is stored at ingestion time, dashboards can reflect accurate historical spend even as pricing tables evolve. Reconstructing costs retroactively from raw logs leads to inconsistencies—especially when model pricing changes or new features are introduced.

Anthropic Spend Tracking: Caching, Long Context, and Edge Cases

Tracking costs for Anthropic models follows a similar logic, but with nuances that can significantly alter the bottom line. As of the same date, Anthropic lists Claude Sonnet 4 at:

Input tokens: $3 per 1M
Output tokens: $15 per 1M

Anthropic also applies special pricing for cache operations:

Cache reads: 10% of base input pricing
5-minute cache writes: 1.25x base input pricing

For a standard request with 8,000 input tokens and 1,200 output tokens, the cost is:

Input: 8,000 / 1,000,000 × 3 = $0.024
Output: 1,200 / 1,000,000 × 15 = $0.018
Total: $0.042 per request

At 2,000 daily requests, this totals $84 per day, or $2,520 per month.

The most overlooked cost driver in Anthropic models is long context. When requests exceed 200,000 input tokens with the 1M context window enabled, input pricing jumps from $3 to $6 per 1M, and output pricing rises from $15 to $22.50 per 1M. A single oversized request with 250,000 input tokens and 2,000 output tokens costs:

Input: 250,000 / 1,000,000 × 6 = $1.50
Output: 2,000 / 1,000,000 × 22.50 = $0.045
Total: $1.545 for one request

Ignoring context tier changes can understate the true cost of a single workflow by an order of magnitude. Teams using long-context prompts must include context tier in their schema to avoid surprises during billing reconciliation.

Start Simple: Build vs Gateway vs Third-Party Tools

Choosing the right approach depends on your current infrastructure and scale. For teams just beginning, a lightweight solution is to enrich gateway logs with business context. Most API gateways already capture tokens and provider details, but they rarely include ownership tags like team, feature, or customer_id. By appending these dimensions at the gateway level, you can generate per-request cost reports without building a new system.

For organizations with heavier usage, a small data pipeline that ingests LLM requests, enriches them with pricing tables, and computes cost in real time may be more reliable. Storing the calculated cost at ingest ensures that dashboards reflect accurate historical spend, even if model pricing changes months later.

Alternatively, emerging third-party tools now specialize in AI cost attribution. These platforms automate schema enforcement, pricing lookups, and dashboarding, reducing the engineering overhead. However, they may come with vendor lock-in or limited customization for niche use cases.

The best path forward depends on your FinOps maturity. But one thing is clear: when AI spend is measured in thousands per month, averages are not enough. The future of AI cost management belongs to teams that can trace every token—and every dollar—back to the workflow that earned it.

AI summary

Learn how to implement per-request LLM cost attribution to identify which teams, features, or customers are driving OpenAI and Anthropic spend. Includes pricing breakdowns and schema templates.

Track AI Spending by Feature with Per-Request LLM Cost Attribution

Why Per-Request Attribution Transforms AI Spend Management

The Essential Schema for AI Cost Tracking by Feature

Calculating OpenAI Cost Attribution Per Request

Anthropic Spend Tracking: Caching, Long Context, and Edge Cases

Start Simple: Build vs Gateway vs Third-Party Tools

Comments

GitHub Setup Essentials: SSH Keys and PATs Explained for New Users

Scale Authorization Cleanly: Roles vs Permissions Explained

Simplify LLM Integration: Build with OpenRouter in Node.js