When finance teams review the monthly bill from OpenAI or Anthropic, they often find more questions than answers. A single vendor invoice shows total spend across models, features, and customer accounts, but it reveals nothing about which team’s experimental feature drove costs, which customer’s workflows became unprofitable, or which prompt template suddenly tripled token usage. The gap between billing data and operational insight is where per-request LLM cost attribution becomes essential.
Until recently, AI spending was treated as a minor line item in broader cloud costs. Yet the FinOps Foundation’s 2025 State of FinOps report reveals a striking shift: 63% of organizations now actively manage AI spending—double the number from the previous year. This surge reflects a new reality: AI expenses are no longer a side effect of infrastructure. They are a primary cost center with distinct unit economics, latency constraints, and ownership requirements. Tracking spend at the request level turns AI from a monthly surprise into a daily operational metric.
Why Per-Request Attribution Transforms AI Spend Management
A monthly vendor bill aggregates costs across unrelated workloads. A support chatbot, an internal code assistant, and a customer-facing generation feature may all share the same OpenAI account, yet their token usage patterns, model choices, and business impact vary wildly. When costs are lumped together, averages obscure the drivers behind spikes. Per-request attribution replaces vague totals with granular answers:
- Which feature triggered the cost spike this afternoon?
- Which team owns the workflow that consumed 40% of this month’s budget?
- Which customer’s requests are now generating negative margins?
- Did a prompt change or a retry loop inflate expenses unexpectedly?
These questions require a denominator that matches the granularity of the spend. Without per-request tracking, teams are forced to reconstruct historical costs from incomplete logs, leading to delayed decisions and misallocated budgets.
The Essential Schema for AI Cost Tracking by Feature
You do not need a full-scale data platform to begin tracking LLM costs accurately. You do need a consistent event schema that captures both technical and business context. Each request record should include:
timestamp– When the request occurred, for trend analysisproviderandmodel– OpenAI’s GPT-5.4 mini or Anthropic’s Claude Sonnet 4input_tokens– Raw tokens processed before generationcached_input_tokens– Tokens retrieved from cache, if supportedoutput_tokens– Tokens generated in responserequest_id– A unique identifier for traceabilityteam– The engineering or product team responsiblefeature– The specific product or workflow using the modelcustomer_idorworkspace_id– For margin analysis and segmentationenvironment– Production, staging, or developmentstatus– Success, timeout, retry, or fallback mode
This schema ensures every cost can be traced back to its source. Without team, ownership remains unclear. Without feature, you cannot assess product-level efficiency. Without customer_id, margin analysis becomes impossible. And without status, retries are logged as normal demand, masking inflated costs.
Calculating OpenAI Cost Attribution Per Request
The math is simple, but timing and accuracy are critical. The formula for computing request-level cost is:
request_cost = (
(input_tokens / 1_000_000 * input_rate)
+ (cached_input_tokens / 1_000_000 * cached_input_rate)
+ (output_tokens / 1_000_000 * output_rate)
+ tool_or_search_fees
)The challenge lies not in the calculation, but in storing the correct pricing rates for the exact model and version used on the day of the request. As of June 8, 2026, OpenAI’s current pricing for GPT-5.4 mini is:
- Input tokens: $0.75 per 1M
- Cached input tokens: $0.075 per 1M
- Output tokens: $4.50 per 1M
For a typical request with 8,000 input tokens, 2,000 cached input tokens, and 1,200 output tokens, the cost breaks down as:
- Input: 8,000 / 1,000,000 × 0.75 = $0.006
- Cached input: 2,000 / 1,000,000 × 0.075 = $0.00015
- Output: 1,200 / 1,000,000 × 4.50 = $0.0054
- Total: $0.01155 per request
At 10,000 daily requests, this pattern generates approximately $115.50 per day, or $3,465 per month. When this cost is stored at ingestion time, dashboards can reflect accurate historical spend even as pricing tables evolve. Reconstructing costs retroactively from raw logs leads to inconsistencies—especially when model pricing changes or new features are introduced.
Anthropic Spend Tracking: Caching, Long Context, and Edge Cases
Tracking costs for Anthropic models follows a similar logic, but with nuances that can significantly alter the bottom line. As of the same date, Anthropic lists Claude Sonnet 4 at:
- Input tokens: $3 per 1M
- Output tokens: $15 per 1M
Anthropic also applies special pricing for cache operations:
- Cache reads: 10% of base input pricing
- 5-minute cache writes: 1.25x base input pricing
For a standard request with 8,000 input tokens and 1,200 output tokens, the cost is:
- Input: 8,000 / 1,000,000 × 3 = $0.024
- Output: 1,200 / 1,000,000 × 15 = $0.018
- Total: $0.042 per request
At 2,000 daily requests, this totals $84 per day, or $2,520 per month.
The most overlooked cost driver in Anthropic models is long context. When requests exceed 200,000 input tokens with the 1M context window enabled, input pricing jumps from $3 to $6 per 1M, and output pricing rises from $15 to $22.50 per 1M. A single oversized request with 250,000 input tokens and 2,000 output tokens costs:
- Input: 250,000 / 1,000,000 × 6 = $1.50
- Output: 2,000 / 1,000,000 × 22.50 = $0.045
- Total: $1.545 for one request
Ignoring context tier changes can understate the true cost of a single workflow by an order of magnitude. Teams using long-context prompts must include context tier in their schema to avoid surprises during billing reconciliation.
Start Simple: Build vs Gateway vs Third-Party Tools
Choosing the right approach depends on your current infrastructure and scale. For teams just beginning, a lightweight solution is to enrich gateway logs with business context. Most API gateways already capture tokens and provider details, but they rarely include ownership tags like team, feature, or customer_id. By appending these dimensions at the gateway level, you can generate per-request cost reports without building a new system.
For organizations with heavier usage, a small data pipeline that ingests LLM requests, enriches them with pricing tables, and computes cost in real time may be more reliable. Storing the calculated cost at ingest ensures that dashboards reflect accurate historical spend, even if model pricing changes months later.
Alternatively, emerging third-party tools now specialize in AI cost attribution. These platforms automate schema enforcement, pricing lookups, and dashboarding, reducing the engineering overhead. However, they may come with vendor lock-in or limited customization for niche use cases.
The best path forward depends on your FinOps maturity. But one thing is clear: when AI spend is measured in thousands per month, averages are not enough. The future of AI cost management belongs to teams that can trace every token—and every dollar—back to the workflow that earned it.
AI summary
Learn how to implement per-request LLM cost attribution to identify which teams, features, or customers are driving OpenAI and Anthropic spend. Includes pricing breakdowns and schema templates.