Production savings.
Real numbers from real traffic. 6,084 requests across multiple clients, backtest verified.
We backtested every request in our production database against what it would have cost without us. The savings come from removing tokens, not just switching models. BYOK customers who keep their own model still save over 83%.
Where the savings come from
| Layer | Share of Savings | What it does |
|---|---|---|
| Windowing | ~61% | Summarizes old messages, keeps recent ones detailed. Hundreds of millions of tokens removed. |
| Routing | ~24% | Sends simple requests to cheaper models. Complex requests stay on your chosen model. |
| Caching | ~2% | System prompts and repeated context cached at the provider level. 80%+ hit rate. |
Bring your own model
Windowing and caching reduce tokens regardless of which model you use. We backtested our production traffic against every major model at their actual pricing. The result was the same across the board:
| Model | Savings |
|---|---|
| Claude Opus 4.6 | 83.4% |
| Claude Sonnet 4.5 | 83.4% |
| GPT-4o | 83.5% |
| DeepSeek V3 | 83.5% |
| Gemini Flash | 83.5% |
Same production traffic, same model, with windowing + caching applied. Your model, fewer tokens.
It depends on your usage
Long agentic conversations with tool calls save the most. Short single-turn requests save the least.
| Usage pattern | Typical savings | Why |
|---|---|---|
| Heavy agent (long sessions, many tool calls) | 85–90% | Windowing removes massive context buildup from tool chains |
| Light agent (short sessions, few tools) | 30–50% | Less context to window, routing still helps |
| Repetitive tasks (status checks, lookups) | ~100% | Handled without calling an LLM at all |
Check your own savings
curl -s https://api.clawzempic.ai/v1/insights?period=7d \ -H 'x-api-key: sk-clwz-YOUR_KEY'
Returns your baseline cost, actual cost, savings percentage, model mix, cache hit rates, and daily trends. Periods: 24h, 7d, 30d. Or log in at www.clawzempic.ai/dash to see it visually.
Pricing
Free
Full optimization until we've saved you $30. Then requests pass through to your provider directly. No credit card.
Core
Unlimited optimized spend. All layers active. At 87% savings on a $500/mo spend, this pays for itself in the first hour.