On this page
How it works The key insight Pinning a model Customizing boundaries

Intelligent Routing

Clawzempic scores the complexity of every request and routes it to the most cost-effective model that can handle it well.

How it works

Each incoming request is analyzed across multiple dimensions:

  • Message length and token count
  • Presence of technical terms, code, or reasoning markers
  • Conversation depth (how many messages and tool calls)
  • Whether the request involves high-stakes domains (security, financial, legal)

Based on the score, the request is routed to one of four tiers:

Tier Handles Typical share
Simple Greetings, acknowledgments, short factual questions ~75%
Mid Moderate tasks, standard tool use ~18%
Complex Your primary model — always used for demanding tasks ~5%
Reasoning Deep analysis, proofs, multi-step logic ~2%

The key insight

Most bot conversations are dominated by simple exchanges. "Hi", "thanks", "what time is standup?" — these don't need Opus or even Sonnet. By routing them to a model that costs 1/5th the price, you save dramatically without any quality loss on the tasks that matter.

Pinning a model

If you want a specific request to bypass routing and use your primary model:

http header
x-model-pinned: true

Or force a specific tier:

http header
x-model-tier: haiku
x-model-tier: opus
x-model-tier: reasoning

Customizing boundaries

You can adjust routing sensitivity per-client via PATCH /v1/settings:

json
{
  "routing": {
    "enabled": true,
    "intelligenceLevel": 50,
    "complexBoundary": 0.1845,
    "reasoningBoundary": 0.5855
  }
}
  • intelligenceLevel (0-100): Higher values route more traffic to expensive models
  • complexBoundary (-1 to 1): Lower values make "complex" easier to trigger
  • reasoningBoundary (-1 to 1): Lower values make "reasoning" easier to trigger
💡
The dashboard includes an IQ slider that maps to these boundary values. Drag it toward "Quality" to send more traffic to your primary model, or toward "Savings" to maximize cost reduction.