The question nobody can answer
Ask any team running AI in production: “What does it cost you to resolve one support ticket with AI?”
Most cannot answer. They know their monthly OpenAI bill. They might know their average tokens per request. But they do not know cost per task, cost per successful outcome, or cost per user segment.
That is the gap AI FinOps closes.
The FinOps Foundation now has a dedicated AI overview. Their core point: AI needs the same cost discipline as cloud, but with different meters, faster cost shifts, and much weaker native visibility. They recommend tracking usage regularly, setting quotas, tagging resources, and aligning spend with outcomes.
Why token tracking is not enough
| What teams track | What they should track |
|---|---|
| Tokens per request | Cost per resolved ticket |
| Monthly API spend | Cost per successful outcome |
| Average latency | Cost per revenue-generating action |
| Model calls per day | Cost per user segment per month |
Token tracking tells you how much text moved through a model. It does not tell you whether the workflow was worth it.
A long, expensive call that resolves a $500 problem may be fine. A cheap call that produces a wrong answer may be disastrous. A token-efficient model that increases retries can cost more than a larger model that gets it right first time.
The five-step framework
Step 1: Define unit economics first
Pick one primary unit per use case:
| Use case | Primary unit | Success definition |
|---|---|---|
| Support | Per ticket | Resolved without human escalation |
| Sales | Per lead | Qualified, accepted by SDR |
| Search | Per query | Answer accepted or clicked |
| Code review | Per review | Patch merged without rewrite |
| Document processing | Per document | Extracted fields match validation |
If you cannot define the unit, you cannot measure whether AI is worth it.
Step 2: Instrument the full request path
Every AI call should carry:
| Field | Purpose |
|---|---|
| user_id / account_id | Cost attribution |
| task_type | Workflow-level analysis |
| model | Catch misrouting |
| prompt_tokens + completion_tokens | Raw cost |
| tool_calls | Agent complexity |
| retries | Silent multiplier |
| cache_hit | Optimization tracking |
| outcome_status | Success/failure/fallback |
| business_outcome | Revenue connection |
Without this telemetry, AI monitoring is just prettier logs.
Step 3: Set budgets at the workflow level
Do not set one giant AI budget for the whole company. Set budgets for:
| Level | Why |
|---|---|
| Feature | Isolate experimental from stable workloads |
| Team | Accountability |
| Environment | Dev/staging should not eat production budget |
| Customer tier | Enterprise vs. free tier cost differently |
| Task class | Routing optimization |
One experimental workflow should not silently eat the entire budget.
Step 4: Review weekly, not quarterly
AI costs move too fast for monthly accounting. Weekly reviews should answer:
| Question | Action if concerning |
|---|---|
| Which task class grew most? | Check for scope creep or new usage patterns |
| Which model is overused? | Route cheaper tasks to smaller models |
| Which retries are rising? | Fix prompts or add fallback logic |
| Which users are disproportionately expensive? | Tier access or add caching |
| Which cache opportunities were missed? | Extend cache TTL or add cache for repeated prompts |
Step 5: Tie cost to product decisions
If AI cost per user is rising, the response is not “find a cheaper model.” It is:
| Question | What it really asks |
|---|---|
| Can we shorten prompts? | Are we sending unnecessary context? |
| Can we gate expensive paths? | Should every user get the frontier model? |
| Can we use a smaller model first? | Is routing in place? |
| Can we shift work to async? | Does the user need real-time? |
| Can we avoid model calls entirely? | Is a rule or cache enough? |
FinOps is not reporting. It is product design.
Monitoring tools comparison
| Tool | Strength | Best for |
|---|---|---|
| LangSmith | LangChain integration, tracing, debugging, evals | Teams already using LangChain |
| Helicone | API proxy, cost tracking, routing, caching | Quick cost visibility |
| Portkey | Gateway, routing, fallbacks, guardrails | Infrastructure-focused teams |
| Langfuse | Open source, self-hostable, tracing + cost | Teams wanting control |
| Arize Phoenix | Eval and experiment workflows | Deep analysis needs |
The exact tool matters less than the pattern: capture every call, group calls into tasks, record outcomes, connect spend to business value.
The cost paradox
Inference is getting cheaper per token. Production spend often rises anyway.
| Year | Trend | Effect |
|---|---|---|
| 2024 | Token prices dropped 5-10x | Teams added more agent steps |
| 2025 | Caching reduced repeated costs | Teams built longer context workflows |
| 2026 | Model routing became standard | Teams started using AI for more task types |
Cheaper unit price encourages more usage. That is the classic FinOps paradox, and it applies to AI exactly as it did to cloud.
The bottom line
If you run AI in production without knowing cost per task, you are flying blind.
The fix is operational, not complex:
| Action | Impact |
|---|---|
| Define task-level units | Know what you are optimizing for |
| Instrument every call | Know what you spend and where |
| Route by value density | Stop sending everything to the most expensive model |
| Cache aggressively | Stop paying for the same work twice |
| Cap retries | Stop silent budget leaks |
| Review weekly | Catch problems before they compound |
| Tie spend to outcomes | Know whether AI is worth it |
The teams that build this layer now will know exactly where AI pays for itself, and where it does not. The teams that wait will discover their real AI cost the way most do: when the first billing spike arrives and nobody can explain it.