Claude Code is the most popular AI coding agent in 2026 — but its real cost is poorly understood. Most developers know the subscription prices ($20 Pro, $200 Max), but the API-direct pricing tells a very different story. Here's the token-level breakdown that matters for anyone running agents at scale.
Model Pricing (Per Million Tokens)
| Model | Input | Output | Cached Input* | Context Window |
|---|---|---|---|---|
| Claude Opus 4.8 | $5.00 | $25.00 | $0.50 | 1M |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 1M |
| Claude Haiku 4.5 | $0.80 | $4.00 | $0.08 | 200K |
*Cached input: Anthropic's prompt caching discount (90% off standard input price). Cache TTL: 5 minutes.
What this means per turn — a typical coding agent turn sends ~20,000 input tokens and generates ~1,000 output tokens:
| Model | Uncached | With Cache Hit |
|---|---|---|
| Opus 4.8 | $0.125 | $0.035 |
| Sonnet 4.6 | $0.075 | $0.021 |
| Haiku 4.5 | $0.020 | $0.006 |
That $0.075 per turn (Sonnet, uncached) sounds cheap. Now multiply by a real session.
The Session Multiplier
A typical Claude Code coding session has a specific token shape:
Turns per session: 30-100 (varies by task complexity)
Input tokens per turn: 15,000-50,000 (grows with conversation)
Output tokens per turn: 500-2,000
Cache hit rate: 40-60% (real-world, not theoretical)
| Session Type | Turns | Avg Input | Cache Rate | Cost (Sonnet 4.6) | Cost (Opus 4.8) |
|---|---|---|---|---|---|
| Quick fix | 15 | 12,000 | 60% | $0.47 | $0.81 |
| Feature (medium) | 50 | 25,000 | 50% | $3.23 | $5.56 |
| Feature (complex) | 100 | 40,000 | 40% | $11.10 | $18.60 |
| Full-day session | 200 | 50,000 | 35% | $31.93 | $53.75 |
A medium feature costs $3-6 in raw API tokens. That's the number to compare against the $200/month Max plan.
Subscription vs API: The Break-Even Math
| Plan | Monthly Cost | Equivalent API Tokens (Sonnet) | Who It's For |
|---|---|---|---|
| Pro ($20) | $20 | ~6 medium features | Light daily use |
| Max 5x ($100) | $100 | ~30 medium features | Professional daily use |
| Max 20x ($200) | $200 | ~60 medium features | Heavy all-day coding |
The catch: Max plans have usage caps. Anthropic doesn't publish exact limits, but heavy users report hitting them. If you exceed the cap, you either wait or switch to API — at which point you're paying full token prices.
Where the Cost Actually Goes
Instrumenting real Claude Code sessions reveals the token distribution:
| Token Source | % of Total | Cache-Sensitive? |
|---|---|---|
| System prompt + tool definitions | 18% | Yes (stable prefix) |
| Conversation history (accumulated) | 42% | No (grows every turn) |
| Tool outputs (grep, cat, typecheck, etc.) | 30% | No (new each turn) |
| Model output (code, explanations) | 10% | N/A |
The key insight: 72% of tokens come from conversation history and tool outputs — content that's unique to each session and can't be cached. This is the Agent Tax we wrote about in our previous article.
Practical Takeaways
-
Use Sonnet for coding, Opus for architecture. Opus costs 66% more per turn but is ~15% better on complex reasoning. For boilerplate code and refactoring, Sonnet is the right default.
-
Keep sessions focused. A 200-turn mega-session costs 6x more than four 50-turn sessions because of context accumulation. Close and reopen sessions between tasks.
-
Prompt caching saves 40-60%, not 90%. The theoretical maximum requires a perfectly stable prefix, which agent sessions rarely maintain.
-
The Max plan subsidy is real. If you're a heavy user, the $200/month Max plan is dramatically cheaper than equivalent API usage ($200 vs $30-50 in API tokens). Just don't hit the cap.
Tracking your actual Claude Code token costs? Synrouter gives you per-session cost visibility and reduces your API bill by up to 85% with session-lifetime caching — no agent-side changes required.