Ile trwa zbudowanie MVP?

Od pomysłu do działającego produktu w produkcji w 2–4 tygodnie. W modelu MVP Sprint pracuję jako one-man software factory z Claude Code i custom agentami, więc skracam czas dostarczenia bez zespołu i overheadu agencji.

Ile kosztuje współpraca?

MVP Sprint to 15 000–30 000 PLN jednorazowo za zbudowanie produktu. Po wdrożeniu dostępny jest Builder Retainer (5 000–10 000 PLN/mies) — Twój developer na abonament — oraz Automation Pack (3 000–8 000 PLN/mies) na automatyzacje Claude Code + MCP.

Co to jest MCP server i po co mi?

MCP (Model Context Protocol) server to most między modelem AI a Twoimi systemami — bazą danych, API, narzędziami. Dzięki niemu asystent AI realnie wykonuje zadania w Twojej firmie (np. czyta zamówienia, generuje raporty, integruje usługi), a nie tylko odpowiada na pytania. Buduję custom MCP servery pod konkretny proces.

Dla kogo jest ta oferta?

Dla solopreneurów i tech-founderów, którzy chcą szybko zwalidować produkt, oraz dla firm, które potrzebują automatyzacji i integracji AI bez budowania własnego zespołu deweloperskiego.

Jak zacząć współpracę?

Zarezerwuj bezpłatny 30-minutowy audyt na https://bartoszgaca.pl/audit/. Omawiamy pomysł, zakres i najszybszą ścieżkę do produkcji, a Ty dostajesz konkretny plan działania.

LLM Routers for Claude Code: How to Cut Your AI Costs in 2026

Claude Code bills are mostly Opus tokens spent on work that a far cheaper model could have done. The fix has two levers: optimize what Claude Code already does (caching, context hygiene, model tiers), then add an LLM router that sends easy tasks to a cheap model and reserves the expensive one for hard reasoning. Below are the open-source routers actually used for this in 2026 — each with its real GitHub repo, current star count (verified via the GitHub API, June 2026), what it's best for, and the honest catch.

TL;DR

Measure first: ccusage (~16.7k★) reads Claude Code's local logs and shows cost per day/model/session. You can't cut what you can't see.

Route Claude Code itself: claude-code-router (~35k★) sends background/simple tasks to cheaper providers (DeepSeek, Qwen, etc.) and keeps a strong model for reasoning.

Team gateway: LiteLLM (~52k★) — one API for 100+ models with budgets, fallbacks and cost tracking.

The honest catch: routing "saves money" by switching to different, cheaper models — not by making Claude itself cheaper. That's a quality trade-off you must measure, not assume.

Why Claude Code gets expensive

Most spend isn't the hard problems — it's volume. Every file read, every retry, every "fix this typo" runs through the same expensive model with a growing context window. Two things compound the bill: large contexts (you pay for every token in the window on every turn) and using a top-tier model for trivial work. Routers attack the second; good context hygiene attacks the first.

Step 0 — Measure before you optimize

ccusage (~16.7k★) is a CLI that parses Claude Code's local JSONL logs and breaks cost down by day, week, month, session and model — including cache-creation vs cache-read tokens and the 5-hour billing window. Run it before and after any change so you optimize against real numbers, not vibes.

npx ccusage@latest        # daily cost report from your local Claude Code logs

Step 1 — Native optimization (no quality loss)

Before routing anything, harvest the free wins inside Claude Code — these keep you on Anthropic models, so there's no quality trade-off:

Prompt caching — reused context (system prompt, files) is billed at a fraction of input price on cache reads.
Context hygiene — /compact and /clear to drop stale history; smaller windows = fewer tokens per turn.
Model tiers — use Haiku/Sonnet for routine edits and reserve Opus for genuinely hard reasoning.

I covered these in depth in the Claude Code token-optimization guide — start there, because routing on top of a bloated context just moves the waste to a cheaper bill.

Step 2 — The LLM routers (route easy work to cheap models)

1. claude-code-router — route Claude Code's own traffic

The most direct option for this use case. musistudio/claude-code-router (~35k★, MIT) is a local gateway (default http://127.0.0.1:3456) that intercepts Claude Code requests and routes them by category — background tasks, reasoning, long-context, web-search — to whichever provider you choose: OpenRouter, DeepSeek, Qwen/SiliconFlow, Moonshot, Mistral, Z.AI and more. Send the cheap, high-volume work to a budget model; keep a strong model for the hard parts.

Opinions & guides: a hands-on review testing the "cut your bill by up to 80%" claim at AI Tool Analysis; a setup & cost-control walkthrough at TokenMix. The catch: it's a community-maintained project, and routing to non-Anthropic models means you're no longer running pure Claude — verify output quality on your own tasks before trusting the savings.

2. LiteLLM — the team gateway with budgets

BerriAI/litellm (~52k★, open source) is an OpenAI-compatible proxy in front of 100+ providers, with the governance most teams actually need: per-key budgets, fallback chains, load balancing, and built-in cost tracking and logging. It doesn't decide "is this query easy?" for you — you wire the routing rules — but it gives you hard spend caps and one place to see where the money goes. Best when several people or apps share an AI budget. Comparison vs the hosted OpenRouter at TrueFoundry.

3. RouteLLM — research-grade strong/weak routing

lm-sys/RouteLLM (~5.1k★, Apache-2.0) from LMSYS (the Chatbot Arena team) trains classifiers on human-preference data to predict, per query, whether a cheap "weak" model will do or you need the "strong" one. Its own README reports routers that "reduce costs by up to 85% while maintaining 95% GPT-4 performance" on MT Bench, evaluated across MT Bench, MMLU and GSM8K. It ships pre-trained routers and an OpenAI-compatible server. The catch: it's research-grade — powerful routing logic, but not a drop-in Claude Code plugin; you integrate it yourself (e.g., behind LiteLLM).

4. semantic-router — fast, build-your-own routing

aurelio-labs/semantic-router (~3.6k★, MIT) makes routing decisions by semantic similarity (embeddings) instead of an extra LLM call — so it's fast and cheap to run. Use it to classify intent ("this is a simple lookup" vs "this needs reasoning") and pick the model yourself. It's a building block, not a turnkey Claude Code router.

Quick comparison

Tool	★ (Jun 2026)	Role	Best for	Catch
ccusage	~16.7k	Cost visibility	Seeing where money goes	Measures, doesn't route
claude-code-router	~35k	CC traffic router	Routing Claude Code directly	Non-Anthropic models; community-maintained
LiteLLM	~52k	Gateway + budgets	Teams, spend caps, fallbacks	You write the routing rules
RouteLLM	~5.1k	Strong/weak classifier	Smart per-query routing	Research-grade, integrate yourself
semantic-router	~3.6k	Fast intent routing	Custom, low-latency logic	A building block, not turnkey

The honest part: when NOT to route

Routing is not free money. A cheaper model that produces wrong code costs you more in debugging than the tokens it saved. Rules I follow:

Never route the hard reasoning. Architecture, tricky refactors, security-sensitive code — keep the strong model.
Measure quality, not just cost. Track rework: if cheap-model output gets reverted, the "savings" are negative.
Native first. Caching + context hygiene + Haiku for trivial edits often cut the bill enough without leaving Anthropic at all — and with zero quality risk.
Mind the terms. Pointing Claude Code at third-party model backends is a community technique, not an official Anthropic feature.

Recommended setup

Measure with ccusage — get your baseline cost per day/model.
Optimize natively — caching, /compact, Haiku/Sonnet for routine work.
Route only the easy, high-volume tasks — claude-code-router for solo use, LiteLLM when a team shares a budget.
Re-measure with ccusage and track rework — keep the routing only where quality holds.

FAQ

What's the difference between optimizing tokens and using a router?
Token optimization (caching, smaller context, Haiku) makes the same Claude work cheaper with no quality loss. A router sends some work to a different, cheaper model — bigger savings, but a quality trade-off you must verify.

Will a router break Claude Code?
Tools like claude-code-router run a local gateway and pass requests through; many tasks work fine on budget models. But it's community-maintained and you're switching models, so test on your real workflow first.

Cheapest setup that doesn't hurt quality?
ccusage to measure + native optimization (caching, /compact, Haiku for trivial edits). That alone often removes most waste without any router.

Are these free?
The tools are open source (MIT/Apache). You still pay for whatever model tokens you consume — the point is consuming fewer, cheaper ones.

Need this set up properly?

I build and run Claude Code, MCP servers and cost-aware routing for solopreneurs and teams — measured savings, not guesswork. If you want your AI bill cut without wrecking output quality, book a free 20-minute Fit Call.

Bartosz Gaca — MVPs, MCP Servers, Claude Code

Usługi — od pomysłu do produkcji w 2–4 tygodnie

Najczęstsze pytania