Claude Code bills are mostly Opus tokens spent on work that a far cheaper model could have done. The fix has two levers: optimize what Claude Code already does (caching, context hygiene, model tiers), then add an LLM router that sends easy tasks to a cheap model and reserves the expensive one for hard reasoning. Below are the open-source routers actually used for this in 2026 — each with its real GitHub repo, current star count (verified via the GitHub API, June 2026), what it's best for, and the honest catch.
TL;DR
- Measure first: ccusage (~16.7k★) reads Claude Code's local logs and shows cost per day/model/session. You can't cut what you can't see.
- Route Claude Code itself: claude-code-router (~35k★) sends background/simple tasks to cheaper providers (DeepSeek, Qwen, etc.) and keeps a strong model for reasoning.
- Team gateway: LiteLLM (~52k★) — one API for 100+ models with budgets, fallbacks and cost tracking.
- The honest catch: routing "saves money" by switching to different, cheaper models — not by making Claude itself cheaper. That's a quality trade-off you must measure, not assume.
Why Claude Code gets expensive
Most spend isn't the hard problems — it's volume. Every file read, every retry, every "fix this typo" runs through the same expensive model with a growing context window. Two things compound the bill: large contexts (you pay for every token in the window on every turn) and using a top-tier model for trivial work. Routers attack the second; good context hygiene attacks the first.
Step 0 — Measure before you optimize
ccusage (~16.7k★) is a CLI that parses Claude Code's local JSONL logs and breaks cost down by day, week, month, session and model — including cache-creation vs cache-read tokens and the 5-hour billing window. Run it before and after any change so you optimize against real numbers, not vibes.
npx ccusage@latest # daily cost report from your local Claude Code logs
Step 1 — Native optimization (no quality loss)
Before routing anything, harvest the free wins inside Claude Code — these keep you on Anthropic models, so there's no quality trade-off:
- Prompt caching — reused context (system prompt, files) is billed at a fraction of input price on cache reads.
- Context hygiene —
/compactand/clearto drop stale history; smaller windows = fewer tokens per turn. - Model tiers — use Haiku/Sonnet for routine edits and reserve Opus for genuinely hard reasoning.
I covered these in depth in the Claude Code token-optimization guide — start there, because routing on top of a bloated context just moves the waste to a cheaper bill.
Step 2 — The LLM routers (route easy work to cheap models)
1. claude-code-router — route Claude Code's own traffic
The most direct option for this use case. musistudio/claude-code-router (~35k★, MIT) is a local gateway (default http://127.0.0.1:3456) that intercepts Claude Code requests and routes them by category — background tasks, reasoning, long-context, web-search — to whichever provider you choose: OpenRouter, DeepSeek, Qwen/SiliconFlow, Moonshot, Mistral, Z.AI and more. Send the cheap, high-volume work to a budget model; keep a strong model for the hard parts.
Opinions & guides: a hands-on review testing the "cut your bill by up to 80%" claim at AI Tool Analysis; a setup & cost-control walkthrough at TokenMix. The catch: it's a community-maintained project, and routing to non-Anthropic models means you're no longer running pure Claude — verify output quality on your own tasks before trusting the savings.
2. LiteLLM — the team gateway with budgets
BerriAI/litellm (~52k★, open source) is an OpenAI-compatible proxy in front of 100+ providers, with the governance most teams actually need: per-key budgets, fallback chains, load balancing, and built-in cost tracking and logging. It doesn't decide "is this query easy?" for you — you wire the routing rules — but it gives you hard spend caps and one place to see where the money goes. Best when several people or apps share an AI budget. Comparison vs the hosted OpenRouter at TrueFoundry.
3. RouteLLM — research-grade strong/weak routing
lm-sys/RouteLLM (~5.1k★, Apache-2.0) from LMSYS (the Chatbot Arena team) trains classifiers on human-preference data to predict, per query, whether a cheap "weak" model will do or you need the "strong" one. Its own README reports routers that "reduce costs by up to 85% while maintaining 95% GPT-4 performance" on MT Bench, evaluated across MT Bench, MMLU and GSM8K. It ships pre-trained routers and an OpenAI-compatible server. The catch: it's research-grade — powerful routing logic, but not a drop-in Claude Code plugin; you integrate it yourself (e.g., behind LiteLLM).
4. semantic-router — fast, build-your-own routing
aurelio-labs/semantic-router (~3.6k★, MIT) makes routing decisions by semantic similarity (embeddings) instead of an extra LLM call — so it's fast and cheap to run. Use it to classify intent ("this is a simple lookup" vs "this needs reasoning") and pick the model yourself. It's a building block, not a turnkey Claude Code router.
Quick comparison
| Tool | ★ (Jun 2026) | Role | Best for | Catch |
|---|---|---|---|---|
| ccusage | ~16.7k | Cost visibility | Seeing where money goes | Measures, doesn't route |
| claude-code-router | ~35k | CC traffic router | Routing Claude Code directly | Non-Anthropic models; community-maintained |
| LiteLLM | ~52k | Gateway + budgets | Teams, spend caps, fallbacks | You write the routing rules |
| RouteLLM | ~5.1k | Strong/weak classifier | Smart per-query routing | Research-grade, integrate yourself |
| semantic-router | ~3.6k | Fast intent routing | Custom, low-latency logic | A building block, not turnkey |
The honest part: when NOT to route
Routing is not free money. A cheaper model that produces wrong code costs you more in debugging than the tokens it saved. Rules I follow:
- Never route the hard reasoning. Architecture, tricky refactors, security-sensitive code — keep the strong model.
- Measure quality, not just cost. Track rework: if cheap-model output gets reverted, the "savings" are negative.
- Native first. Caching + context hygiene + Haiku for trivial edits often cut the bill enough without leaving Anthropic at all — and with zero quality risk.
- Mind the terms. Pointing Claude Code at third-party model backends is a community technique, not an official Anthropic feature.
Recommended setup
- Measure with ccusage — get your baseline cost per day/model.
- Optimize natively — caching,
/compact, Haiku/Sonnet for routine work. - Route only the easy, high-volume tasks — claude-code-router for solo use, LiteLLM when a team shares a budget.
- Re-measure with ccusage and track rework — keep the routing only where quality holds.
FAQ
What's the difference between optimizing tokens and using a router?
Token optimization (caching, smaller context, Haiku) makes the same Claude work cheaper with no quality loss. A router sends some work to a different, cheaper model — bigger savings, but a quality trade-off you must verify.
Will a router break Claude Code?
Tools like claude-code-router run a local gateway and pass requests through; many tasks work fine on budget models. But it's community-maintained and you're switching models, so test on your real workflow first.
Cheapest setup that doesn't hurt quality?
ccusage to measure + native optimization (caching, /compact, Haiku for trivial edits). That alone often removes most waste without any router.
Are these free?
The tools are open source (MIT/Apache). You still pay for whatever model tokens you consume — the point is consuming fewer, cheaper ones.
Need this set up properly?
I build and run Claude Code, MCP servers and cost-aware routing for solopreneurs and teams — measured savings, not guesswork. If you want your AI bill cut without wrecking output quality, book a free 20-minute Fit Call.
Related: Claude Code token optimization: the complete guide, Claude Code in production: 6 months of experience, and what an MCP server is and why your business needs one.