Intelligent Routing

Clawzempic scores the complexity of every request and routes it to the most cost-effective model that can handle it well.

How it works

Each incoming request is analyzed across multiple dimensions:

Message length and token count
Presence of technical terms, code, or reasoning markers
Conversation depth (how many messages and tool calls)
Whether the request involves high-stakes domains (security, financial, legal)

Based on the score, the request is routed to one of four tiers:

Tier	Handles	Typical share
Simple	Greetings, acknowledgments, short factual questions	~75%
Mid	Moderate tasks, standard tool use	~18%
Complex	Your primary model — always used for demanding tasks	~5%
Reasoning	Deep analysis, proofs, multi-step logic	~2%

The key insight

Most bot conversations are dominated by simple exchanges. "Hi", "thanks", "what time is standup?" — these don't need Opus or even Sonnet. By routing them to a model that costs 1/5th the price, you save dramatically without any quality loss on the tasks that matter.

Pinning a model

If you want a specific request to bypass routing and use your primary model:

http header

x-model-pinned: true

Or force a specific tier:

http header

x-model-tier: haiku
x-model-tier: opus
x-model-tier: reasoning

Customizing boundaries

You can adjust routing sensitivity per-client via PATCH /v1/settings:

json

{
  "routing": {
    "enabled": true,
    "intelligenceLevel": 50,
    "complexBoundary": 0.1845,
    "reasoningBoundary": 0.5855
  }
}

intelligenceLevel (0-100): Higher values route more traffic to expensive models
complexBoundary (-1 to 1): Lower values make "complex" easier to trigger
reasoningBoundary (-1 to 1): Lower values make "reasoning" easier to trigger

💡

The dashboard includes an IQ slider that maps to these boundary values. Drag it toward "Quality" to send more traffic to your primary model, or toward "Savings" to maximize cost reduction.

Previous Cost Savings Next Model Cascades