How much does Claude Code cost per month?

It depends on your usage: light hobby use (≤5 sessions/week) costs $5-15/month on API pay-as-you-go. Daily professional use costs $100/month with Claude Code Max 5x. Heavy all-day coding costs $200/month with Max 20x. Teams running autonomous agents can see $500-5,000+/month in API costs.

Claude Code API vs Max plan, which is cheaper?

For light users (<5 sessions/week), API pay-as-you-go is cheaper at $5-15/month. For daily professional use, Claude Code Max ($100-200/month) is cheaper because it includes a usage allowance. For heavy users exceeding Max limits, direct API with a caching layer is the most cost-effective.

How to calculate Claude Code token cost?

Count your input and output tokens per session. Input tokens include system prompt + tool schemas + conversation history + file contents. Multiply by the model's per-million-token rate. A typical 50-turn Sonnet session with 50K tokens/turn costs ~$2.25 in input + $7.50 in output = ~$9.75 per session without caching.

What is the most expensive part of a Claude Code session?

Re-sent context. 85-95% of tokens in each turn are identical to the previous turn (system prompt, tools, history). Without caching, you pay full price for these on every turn. With session-lifetime caching, these cost 10% of normal price — an 85% savings.

Did Claude Code pricing change in 2026?

Anthropic has not changed Claude Code subscription pricing ($20/$100/$200 tiers). However, API token pricing for individual models has shifted as new models launched. Opus 4.8 is $15/$75 per million tokens, while Sonnet 4.6 is $3/$15. The agent tax means real costs are always higher than the headline per-token price suggests.

Claude Code API Pricing 2026: Real Costs & How to Cut Your Bill 85%

Last updated: June 15, 2026. We update this whenever Anthropic ships new pricing or a new model.

We get the same question every week: if I use Claude Code seriously, what does it actually cost me?

The honest answer is that nobody can tell you from the pricing page. Claude Code Pro is "$20 a month." Sonnet 4.6 is "$3 per million input tokens." Both numbers are correct. Neither of them survives contact with a real coding session.

We've been running Claude Code in production at Synrouter. We've instrumented sessions, measured token shapes, watched bills land. The gap between the headline price and the real bill turned out to be bigger than we expected.

This post is the writeup of what we learned. The math. The hidden multipliers. The levers that actually move the bill.

If you've got 60 seconds, jump to the verdict below. Otherwise every section reads standalone — skip around.

The Quick Verdict

For most teams, this is what real bills look like:

Usage profile	Best plan	Typical monthly cost	Worst-case cost (API only)
Light hobby use (≤5 sessions/week)	API pay-as-you-go	$5-15	$20
Daily professional use	Claude Code Max 5x ($100)	$100	$300-500
Heavy all-day coding	Claude Code Max 20x ($200)	$200	$800-1500
Team / autonomous agent fleet	Direct API + caching layer	$500-5000+	Linear in usage

The cheapest path comes down to one thing: how predictable is your usage?

Max subscriptions only pay off if you reliably eat the capacity you bought. Going direct to the API only pays off if you have caching and routing infrastructure in front of it. Without that infrastructure, multi-turn sessions burn through tokens faster than most people expect.

The rest of this post is why.

Model Pricing (Per Million Tokens)

Current Anthropic API prices, as of June 2026:

Model	Input	Output	Cached Input*	Cache Write**	Context Window
Claude Opus 4.8	$5.00	$25.00	$0.50	$6.25	1M
Claude Sonnet 4.6	$3.00	$15.00	$0.30	$3.75	1M
Claude Haiku 4.5	$0.80	$4.00	$0.08	$1.00	200K

* Cached input — 90% off the standard input price for tokens read from cache. Cache TTL is 5 minutes by default, which turns out to matter a lot.

** Cache write — a 25% surcharge over base input the first time a prefix gets cached. Amortized across reads, so it's fine if the cache actually gets reused.

What this means per turn

A typical Claude Code turn pushes ~20,000 input tokens and gets back ~1,000 output tokens:

Model	Uncached	First write (cached)	Subsequent cache hit
Opus 4.8	$0.125	$0.150	$0.035
Sonnet 4.6	$0.075	$0.090	$0.021
Haiku 4.5	$0.020	$0.024	$0.006

$0.075 per turn sounds cheap. Here's the trap: real coding sessions are never 1 turn.

The Session Multiplier

When we instrumented our own Claude Code sessions, the token shape was remarkably consistent:

Turns per session:     30-100 (varies by task complexity)
Input tokens per turn: 15,000-50,000 (grows with conversation)
Output tokens per turn: 500-2,000
Cache hit rate:        40-60% (real-world, not theoretical 90%+)

Plug those into per-turn cost and the real session bill shows up:

Session type	Turns	Avg input	Cache rate	Sonnet 4.6	Opus 4.8
Quick fix (one bug)	15	12,000	60%	$0.47	$0.81
Feature (medium)	50	25,000	50%	$3.23	$5.56
Feature (complex)	100	40,000	40%	$11.10	$18.60
Full-day session	200	50,000	35%	$31.93	$53.75

A medium feature runs $3-6. That's one feature. That's the number you should be comparing against your $100-200/month Max plan, not the $0.075-per-turn fantasy.

Why is the cache hit rate so low? Agent sessions append to context every single turn. Tool output gets appended, your reply gets appended, the agent rewrites its own plan halfway through. The "cacheable prefix" keeps shrinking as a fraction of the request. We unpacked this dynamic in The Agent Tax: Why Your AI Agent Costs 10x More Than You Expected.

Subscription vs API: The Break-Even Math

Three Claude Code tiers, and where each actually breaks even against API pricing:

Plan	Monthly cost	Equivalent API tokens (Sonnet)	Real-world break-even	Who it's for
Pro ($20)	$20	~6 medium features	3-5 medium features	Light daily use
Max 5x ($100)	$100	~30 medium features	12-20 medium features	Professional daily use
Max 20x ($200)	$200	~60 medium features	25-40 medium features	Heavy all-day coding

Notice that the "real-world" break-even is lower than the theoretical one. That's because Max plans have soft usage caps. Anthropic doesn't publish the limits, but heavy users hit them. Regularly. We've hit them.

When you hit the cap, you wait until reset or fall back to direct API — and now you're paying full token prices on top of your subscription.

For most professional developers, Max 5x is the sweet spot. You get predictable billing with meaningful headroom. You don't pay for capacity you'll never use.

Should you go 20x? Only if one of three things is true: you've got multiple coders sharing the account, you run autonomous overnight agents, or last month's API bill told you you'd otherwise pay $300+.

Where the Cost Actually Goes

We instrumented this. Here's where the tokens actually disappear:

Token source	% of total	Cache-sensitive?
System prompt + tool definitions	18%	Yes (stable prefix)
Conversation history (accumulated)	42%	No (grows every turn)
Tool outputs (grep, cat, typecheck, npm install...)	30%	No (new each turn)
Model output (code, explanations)	10%	N/A

72% of tokens are unique to the session. They can't be cached. They can't be deduped across users. They grow linearly with how long you stay in the session. The 18% that is cacheable is the part everyone optimizes — which is fine, but it's not where the money is.

The biggest line item is the most reducible one: tool outputs at 30%. A grep -r against a real repo, an npm install log, a pytest traceback — these dump thousands of tokens of mostly low-signal text into context. We benchmarked it and found agents routinely waste 60%+ of their tool-output budget on ANSI codes, progress spinners, and duplicate log lines. Full breakdown in Your Agent's Tool Outputs Are Wasting 60% of Your Token Budget.

Why Claude Code Specifically Is Expensive at Scale

Three structural reasons. None of them are bugs.

1. Multi-turn by design. Claude Code is built around iteration — plan, read files, draft code, run tests, fix, repeat. Each turn re-bills the entire accumulated context. A vanilla one-shot completion API call would have charged you once. Claude Code charges you 20-100 times.

2. Hot context window. Claude Code keeps a wide view of the repo loaded by default. System prompt, tool definitions, file contents, recent diffs — that's typically 15,000-30,000 tokens before you even type anything. Every turn pays that floor again.

3. The 5-minute cache TTL. This is the one that gets people.

Anthropic's caching was clearly designed for chat sessions, where users reply in seconds. But real coding agents pause for minutes between turns. You read the diff. You try the change. You come back with feedback. By the time you reply, the cache is gone.

This is the single biggest gap between the marketed savings (90%) and what we observe in production (40-60%). We wrote up the workaround patterns in The 5-Minute TTL: How Anthropic's Prompt Cache Quietly Broke Long-Running Agents.

None of this is a flaw. These are consequences of how agents work. But they're also why your $20/month Pro plan runs out of capacity mid-week, and why teams running Claude Code via the API see bills that scale linearly with headcount.

A reality check from the wild: OpenClaw — Peter Steinberger's 100-agent Codex fleet — burned 603 billion tokens in 30 days, for a $1.3M monthly bill. Extreme, sure. But the math scales down predictably. An autonomous agent doing one developer's worth of work for one month at unoptimized rates lands somewhere around $400-800.

How to Cut Your Claude Code Bill — Practical Levers

In order of how much they actually move the bill:

1. Pick the right model per task (saves 40-60%)

Defaulting to Opus 4.8 is the single most expensive habit we see. Opus is genuinely better at architecture and ambiguous design work. But for the 70% of coding tasks that are refactors, boilerplate, or implementing well-specified features, Sonnet 4.6 produces near-identical output at 40% of the price.

Cheaper still: Haiku 4.5 for high-volume, low-stakes work. Log analysis, typo fixes, comment generation, doc extraction. It's 6x cheaper than Sonnet on output and surprisingly capable for routine code.

We dug into routing strategy in Codex vs Claude Code: Why "Pick One" Is the Wrong Question.

2. Keep sessions focused (saves 30-50%)

A single 200-turn session costs roughly 6x more than four 50-turn sessions doing the same amount of work. Why? Context accumulation. By turn 150 you're carrying 80,000 tokens of conversation history that get re-billed every single turn. Most of it has nothing to do with what you're working on right now.

The rule we follow: close and restart at every natural task boundary. Done writing the feature? Start fresh for the tests. Tests pass? Start fresh for the PR description.

3. Use prompt caching aggressively (saves 40-60%, not 90%)

Prompt caching works well when:

Your system prompt is stable across turns
Tool definitions don't change mid-session
The cacheable prefix is large (≥1024 tokens for cache to kick in)

It falls apart when:

The agent edits its own system prompt
You pause >5 minutes between turns
Each turn injects different file contents at the top

The most overlooked optimization: place your cache_control breakpoints intentionally. Anthropic gives you 4 of them per request. Put them at the end of stable sections — system prompt, tool definitions, retrieved docs. That's dramatically more effective than letting auto-caching figure it out.

4. Filter tool outputs (saves 15-30%)

Most agents pipe raw grep, cat, and npm install output straight into context. That output is full of:

ANSI escape codes (\x1b[31m...)
Duplicate progress bars (5%, 10%, 15%...)
Redundant stack frames
Verbose package install logs (added 1247 packages)

Stripping this in your tool layer — before it ever hits the model — typically cuts tool-output tokens in half. For agents where tool outputs are 30% of spend, that's a 15% reduction in your bill. From doing nothing the model can see.

5. Pool caches across sessions (saves 30-50% for teams)

Prompt caching is per-API-key by default. So if you've got 10 developers all running similar Claude Code workflows, each one pays full cache-write cost the first time, every session. A shared caching proxy that maintains warm caches across the team multiplies your effective cache hit rate.

This is what we built Synrouter to do — session-lifetime caching that survives Anthropic's 5-minute TTL, shared across the team, no agent-side changes. On teams of 10+ doing similar work, we've measured 80%+ reduction in the monthly bill.

Comparison: Claude Code vs Other Coding Agents (June 2026 Snapshot)

For context, here's how Claude Code stacks up against the rest of the field:

Agent	Underlying model	Headline subscription	Per-turn API cost (uncached)
Claude Code	Anthropic Opus/Sonnet	$20-200/mo	$0.075-$0.125
Cursor	Multiple (configurable)	$20-40/mo	$0.040-$0.100
Codex CLI	OpenAI GPT-5/o1	Plus $20/mo + API	$0.060-$0.180
Aider	Any (configurable)	Free + API	Depends on model

The headline numbers lie. We're working on a full comparison post with apples-to-apples session-level measurement — Claude Code vs Cursor vs Codex Pricing — coming next week.

Short version: picking one agent is the wrong frame. The cost-efficient teams route different task types to different agents, and that routing is where the real savings live.

Frequently Asked Questions

Is Claude Code cheaper than Cursor?

At the headline level, Cursor Pro ($20) and Claude Code Pro ($20) are a wash. Real cost depends on what you're doing. Cursor's autocomplete is cheaper for high-volume small completions. Claude Code is cheaper for multi-turn agent workflows because it has better prompt caching support.

Can I use Claude Code without an Anthropic API key?

Yes — with a Claude Code subscription (Pro/Max), Anthropic manages the capacity. Without a subscription, you bring your own ANTHROPIC_API_KEY and pay per-token directly.

What does a typical PR review cost?

For a moderate PR (10-20 files, 500 lines changed), a thorough review is usually 15-30 turns, around 200,000-400,000 input tokens at a 50% cache hit rate. So $0.30-$0.80 on Sonnet 4.6, $0.50-$1.30 on Opus 4.8.

Does Claude Code charge for failed responses?

Yes. Anthropic bills on tokens processed, not on outcomes. If Claude Code writes code that fails tests and you ask it to fix that, both turns are billed. Which is one more reason to prefer short focused sessions over long debugging marathons.

What's the cheapest way to run Claude Code at team scale?

For teams of 5+ developers daily, the cheapest path is direct API access through a caching/routing proxy. The proxy handles session-lifetime caching (around Anthropic's 5-minute TTL), per-task model routing, and shared cache pools across the team. Synrouter is one such proxy.

How often does Anthropic change pricing?

Anthropic has historically held pricing stable for 6-12 months after a model release, then adjusted on the next generation. Cache pricing has been stable since it shipped. We update this guide whenever it moves.

Updates Log

2026-06-15 — Added FAQ section, comparison table, and expanded "How to Cut Your Bill" with five practical levers.
2026-06-04 — Initial publication with model pricing and session-level breakdown.

See also: Supported Models & Pricing — full model catalog with per-token rates and provider availability.

We built Synrouter because our own Claude Code bill kept doing things we didn't expect. It gives you per-session cost visibility, session-lifetime caching that survives Anthropic's 5-minute TTL, and team-wide cache pooling — no agent-side code changes. If any of the patterns in this post sound familiar, hop on the waitlist.