Step 2: know what you're spending on

At the end of this step, you should have answers to the following questions:

Which apps, features, or workflows consume LLM tokens?
Who owns each area of spending?
What’s the current cost per [user/query/conversation/outcome]?

The aggregation problem

Most teams start with a single bill from their LLM provider: Total monthly spend: $47,293
Input tokens: 2.3 B
Output tokens: 890M That’s it. One number. No context. No way to know what’s working and what’s hemorrhaging money. You can’t answer basic questions:

Which product feature costs the most?
Is the chatbot more expensive than the code assistant?
Did that optimization last week actually work?
Which team owns the spending spike?

This is like running a business with one bank statement that says “$47K spent on stuff.” You wouldn’t accept that for regular business expenses. Don’t accept it for AI.

The fundamentals: Input + output tokens

Good news. LLM billing is simpler than cloud. Almost everything comes down to these core items:

Input tokens: What you send to the model (prompts, context, documents)
Output tokens: What the model generates back

Some providers charge for cache hits, function calls, or image processing, but tokens are 90%+ of your bill.

The first challenge isn’t understanding the bill. It’s attributing it.

Break down the spending

The goal is to move from aggregate numbers to granular insights. Here’s the progression:

Level 1: Organization-wide

❌ Total: $47,293/month across all models and applications You have no idea what to optimize. Every feature looks equally responsible.

Level 2: By app or product

Customer chatbot: $28,400 (60%)
Code assistant: $12,900 (27%)
email classifier: $5,993 (13%)

Now you know where to focus. The chatbot is your biggest opportunity.

Level 3: By feature or use case

Customer chatbot
- Live support: $18,200 (64%)
- FAQ responses: $7,100 (25%)
- Conversation summaries: $3,100 (11%)

Now you can make informed decisions. Maybe FAQ responses don’t need your best model. Maybe summaries can run in batch overnight.

Level 4: By owner and cost per outcome

Live support (Customer Success Team)
- Monthly spend: $18,200
- Conversations handled: 4,320
- Cost per conversation: $4.21
- Owner: Sarah Chen (VP Customer Success)

Now you can measure success. If you optimize and drop cost per conversation to

2.50, you saved

7,387/month—and you know exactly who to celebrate with.

Don’t obsess over perfect attribution on day one. Start with app-level tracking. Refine to feature-level as you optimize. Perfection is the enemy of progress.

How to track spending at the source

You need infrastructure that attributes costs automatically. Here are the three most common approaches:

1. Resource tagging

Tag every LLM-related resource (API keys, endpoints, services) with metadata:

{
  "application": "customer-chatbot",
  "feature": "live-support",
  "team": "customer-success",
  "environment": "production"
}

Your cloud provider’s billing dashboard can then group costs by these tags. This works well if you already use infrastructure-as-code and have mature tagging practices.

Pros: Automatic attribution once configuredCons: Requires disciplined tagging from day one

2. Application-specific endpoints

Create separate API keys or proxy endpoints for each application:

api.yourcompany.com/chatbot → Customer chatbot
api.yourcompany.com/code-assist → Code assistant
api.yourcompany.com/classifier → Email classifier

Each endpoint logs usage separately. Your billing becomes self-documenting.

Pros: Immediate visibility with no code changesCons: Requires some infrastructure work upfront

How Narev helps: We’ve built application-specific endpoints with automatic usage tracking and attribution. You get granular cost visibility without infrastructure changes—just route traffic through Narev and see exactly where tokens are going.

3. Tracing

Wrap your LLM calls with middleware that logs metadata and traces execution:

# Pseudocode
response = llm.generate(
    prompt=user_query,
    metadata={
        "application": "chatbot",
        "feature": "live-support",
        "user_id": "user_12345",
        "team": "customer-success"
    }
)

Your middleware sends usage data to your analytics platform (Datadog, Grafana, custom DB). Consider using tracing frameworks for deeper visibility:

OpenTelemetry: Industry-standard observability framework that captures spans, traces, and metrics across your LLM pipeline
LangSmith: Purpose-built for LLM applications, tracks prompts, completions, latency, and costs
Langfuse: Open-source LLM observability with automatic cost tracking and evaluation workflows
Arize Phoenix: Monitors model performance, token usage, and traces multi-step agent workflows

Tracing gives you more than just cost attribution—you see the full execution path, identify bottlenecks, and debug quality issues. If your LLM call is part of a multi-step workflow (agents, retrieval pipelines, function calling), tracing shows exactly where tokens and time are spent.

Pros: Maximum flexibility, control, and deep observabilityCons: Most engineering work required, needs integration with existing monitoring stack

How Narev helps: We’ve built an easy integration with most of the tracing solutions in our Enterprise plan.

Identify the owners

Every dollar of LLM spending should have a clear owner—someone who:

Understands the use case and user experience
Can make tradeoffs between cost, quality, and speed
Has authority to approve changes

This might be:

A product manager (for user-facing features)
An engineering lead (for internal tooling)
A team lead (for department-specific applications)
You (if it’s your project)

Why this matters: When you find a 60% cost reduction in Step 3, you need someone who can evaluate the quality tradeoff and greenlight the change. If you’re optimizing someone else’s feature without their input, you’re setting yourself up for rejection. It’s helpful to have a simple ownership map:

Application/Feature	Owner	Team	Monthly Spend	Priority
Chatbot - Live support	Sarah Chen	Customer Success	$18,200	High
Chatbot - FAQ	Sarah Chen	Customer Success	$7,100	Medium
Code assistant	Alex Rivera	Engineering	$12,900	High
Email classifier	Jordan Kim	Operations	$5,993	Low

Share this with your stakeholders from Step 1. Make sure everyone agrees on:

The spending breakdown (does it look right?)
The ownership assignments (is Alex the right person for code assist?)
The optimization priorities (should we start with live support or the code assistant?)

You’re ready to optimize

Time to cut costs while improving quality. Move to Step 3: Optimize.

Guides

FinOps for AI

Cost Optimization

Glossary

Step 2: know what you're spending on

The aggregation problem

The fundamentals: Input + output tokens

Break down the spending

Level 1: Organization-wide

Level 2: By app or product

Level 3: By feature or use case

Level 4: By owner and cost per outcome

How to track spending at the source

1. Resource tagging

2. Application-specific endpoints

3. Tracing

Identify the owners

You’re ready to optimize

​The aggregation problem

​The fundamentals: Input + output tokens

​Break down the spending

​Level 1: Organization-wide

​Level 2: By app or product

​Level 3: By feature or use case

​Level 4: By owner and cost per outcome

​How to track spending at the source

​1. Resource tagging

​2. Application-specific endpoints

​3. Tracing

​Identify the owners

​You’re ready to optimize

The aggregation problem

The fundamentals: Input + output tokens

Break down the spending

Level 1: Organization-wide

Level 2: By app or product

Level 3: By feature or use case

Level 4: By owner and cost per outcome

How to track spending at the source

1. Resource tagging

2. Application-specific endpoints

3. Tracing

Identify the owners

You’re ready to optimize