iToverDose/Software· 22 MAY 2026 · 00:04

Track AI app costs per user in minutes without extra SDKs

A $400 OpenAI bill with no breakdown by user can sink your AI app. Learn three robust methods to attribute costs per customer in real time—no complex setups required.

DEV Community4 min read0 Comments

When your AI-powered feature goes live, usage spikes fast—and so do expenses. One week after launch, a seemingly small feature can generate a $400 OpenAI bill with no clear breakdown of which customers drove the cost. Without precise per-user cost attribution, scaling your AI app becomes a guessing game.

The gap isn’t just technical—it’s operational. Without visibility into who’s spending what, you risk overcharging free-tier users, underpricing power users, or missing runaway costs after a release. Fortunately, tracking per-user AI spend is easier than you think. Here are three practical approaches, ranked by setup time, to embed cost attribution into your AI infrastructure.

Wrap your LLM client in minutes (works with Express, Fastify, Next.js)

If your app routes through a single OpenAI or Anthropic client, you can start tracking user-level costs in under five minutes by wrapping the provider client.

import OpenAI from 'openai'
import { wrapOpenAI, withTrace } from '@voightxyz/openai'

// Wrap the client once at startup
const openai = wrapOpenAI(new OpenAI(), {
  agent: 'production-chat-api',
})

// In your route handler
app.post('/api/chat', async (req, res) => {
  await withTrace(
    async () => {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: req.body.messages,
      })
      res.json({ reply: response.choices[0].message })
    },
    {
      routeTag: 'POST /api/chat',
      tags: {
        userId: req.user.id,
        plan: req.user.plan,
      },
    }
  )
})

The key is using withTrace at the request boundary. Every LLM call triggered within that block inherits the userId and plan tags automatically through Node.js’s AsyncLocalStorage. You avoid threading user identifiers through every function call or embedding them in prompts.

  • ✅ Best for: Microservices, REST APIs, and monoliths using Express, Fastify, or Next.js API routes.
  • ❌ Limitation: Requires using the wrapper SDKs for OpenAI or Anthropic.

Use OpenTelemetry metadata with the Vercel AI SDK

If you’re using the Vercel AI SDK, you can leverage its experimental telemetry system to inject user context into observability traces.

import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST(req: Request) {
  const result = streamText({
    model: openai('gpt-4o-mini'),
    prompt: (await req.json()).prompt,
    experimental_telemetry: {
      isEnabled: true,
      metadata: {
        userId: session.user.id,
        plan: session.user.plan,
      },
    },
  })
  return result.toAIStreamResponse()
}

This approach emits OpenTelemetry-compatible spans with custom metadata. Tools like Langfuse, Arize Phoenix, Braintrust, or Datadog automatically ingest these attributes, letting you query and analyze costs by user.

  • ✅ Best for: Serverless apps, Vercel deployments, and teams already using OpenTelemetry.
  • ❌ Limitation: Only works if your SDK emits OpenTelemetry-compatible spans.

Manually emit events for background workers and agents

Background jobs, autonomous agents, or non-HTTP workflows don’t have a natural request boundary. In these cases, emit cost events manually before and after each LLM call.

import { Voight } from '@voightxyz/sdk'

const voight = new Voight({ agentId: 'my-bot' })

const startTime = Date.now()
const response = await fetch(' {
  method: 'POST',
  headers: {
    authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
  },
  body: JSON.stringify({
    model: 'gpt-4o-mini',
    messages: [...],
  }),
}).then((r) => r.json())

voight.log({
  type: 'reasoning',
  model: 'gpt-4o-mini',
  durationMs: Date.now() - startTime,
  outcome: 'success',
  metadata: {
    tokens: {
      input: response.usage.prompt_tokens,
      output: response.usage.completion_tokens,
    },
    tags: {
      userId: job.userId,
      tenantId: job.tenantId,
    },
  },
})

This gives you full control over what gets logged, including token counts, duration, and custom tags. It’s more boilerplate but ideal when LLM calls bypass your HTTP layer entirely.

  • ✅ Best for: Background workers, queue-based systems, and proxy setups.
  • ❌ Limitation: Requires manual token counting and event emission.

What becomes possible once user tags are in place

With userId consistently attached to every LLM event, you unlock powerful queries without additional tools or SDKs:

  • Identify your highest-spending users this month by grouping events by userId and summing costs.
  • Detect if free-tier users are subsidizing power users by filtering events where plan: 'free' and ranking by cost.
  • Pinpoint bill spikes after a release by filtering events by userId and date range.
  • Calculate cost-to-revenue ratios per customer by joining telemetry data with your billing system using userId.

These insights emerge naturally from your existing observability pipeline. No client-side analytics SDKs, no manual log parsing, and no prompt pollution.

Respecting privacy and multi-tenancy

Never embed personally identifiable information (PII) like email addresses or wallet IDs into telemetry metadata. Use stable internal identifiers such as user_a3f9c2 instead.

For multi-tenant applications, add a second tag: tenantId. This dual-tagging lets you answer both “Which customer is this?” and “Which of their users?”—critical for B2B SaaS environments.

Most observability platforms automatically scrub PII, but it’s best practice to avoid ingesting sensitive data altogether.

The path forward for AI cost transparency

No matter your stack—Express, Fastify, Vercel AI SDK, or custom agents—consistent user attribution starts with stamping userId at the request or event boundary. From there, the metadata propagates through every LLM call, enabling real-time cost analysis without reinventing your infrastructure.

Open-source wrappers like @voightxyz/openai, @voightxyz/anthropic, and @voightxyz/vercel-ai simplify adoption. They integrate with popular observability platforms such as Langfuse, Arize Phoenix, Braintrust, and Datadog—so the same pattern works across your entire AI stack.

As AI usage grows, so does the cost risk. The sooner you implement per-user attribution, the fewer surprises you’ll face when the next invoice arrives.

AI summary

Yapay zeka uygulamalarınızda kullanıcı başına maliyetleri nasıl izlersiniz? Üç pratik yöntemle OpenAI harcamalarınızı kontrol altında tutun ve faturalarınızı optimize edin.

Comments

00
LEAVE A COMMENT
ID #G1K4ET

0 / 1200 CHARACTERS

Human check

9 + 6 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.