Model Routing

SandBase acts as an intelligent gateway between your application and multiple LLM providers. When you send a request, SandBase determines the best provider to handle it based on model availability, capabilities, and your configuration.

How Routing Works

When a request arrives at SandBase, it goes through a multi-step routing pipeline:

Request → Authentication → Rate Limiting → Balance Check → Capability Filter → Route Selection → Provider

1. Model Resolution

SandBase maps your requested model name to one or more provider endpoints. For example, claude-sonnet-4 might be available through:

OpenRouter (default fallback)
Anthropic Direct (lower latency, optional)

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sandbase.ai/v1",
    api_key="sk-sb-your-key"
)

# SandBase resolves "claude-sonnet-4" to the best available provider
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

2. Capability Filtering

SandBase inspects your request to determine which capabilities are required, then filters out providers that don't support them.

Capabilities detected from request parameters:

Request Feature	Required Capability
`tools` parameter present	`tools`
Image content in messages	`vision`
`reasoning` or `thinking` parameter	`thinking`
`response_format: { type: "json_schema" }`	`json_schema`
`cache_control` on messages	`cache_control`
`stream: true`	`stream`

Example: If you send a request with tools and an image in the messages, SandBase only routes to providers that support both tools AND vision for that model.

python

# This request requires: tools + vision
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
            ]
        }
    ],
    tools=[{
        "type": "function",
        "function": {
            "name": "describe_image",
            "description": "Describe the image content",
            "parameters": {"type": "object", "properties": {"description": {"type": "string"}}}
        }
    }]
)

3. Priority-Based Selection

After filtering, SandBase selects from remaining candidates using priority-based routing:

Direct providers (if configured) — lowest latency, no intermediary
OpenRouter — universal fallback, supports all models

Priority can be configured per-model in the admin panel. The default behavior routes through OpenRouter unless a direct provider is configured with higher priority.

Fallback Behavior

If the primary provider fails (5xx error, timeout, or rate limit), SandBase automatically falls back to the next available provider:

Primary Provider (Anthropic Direct)
    ↓ fails
Fallback Provider (OpenRouter)
    ↓ fails
Error returned to client

Automatic Retry vs Fallback

Scenario	Behavior
Provider returns 5xx	Retry once, then fallback to next provider
Provider returns 429 (rate limited)	Immediate fallback to next provider
Provider timeout (no response in 60s)	Fallback to next provider
All providers fail	Return error to client with details

Capability-Aware Routing Errors

If no provider supports all required capabilities for your request, SandBase returns a clear error:

json

{
  "error": {
    "type": "capability_unsupported",
    "message": "No available provider supports all required capabilities for this request.",
    "detail": {
      "required_capabilities": ["thinking", "tools"],
      "missing_for_all_candidates": ["thinking"]
    }
  }
}

HTTP Status: 400 Bad Request

Capability Degradation Rules

Capability	Behavior When Unsupported
`tools`	Hard filter — request fails if no provider supports it
`thinking`	Hard filter — user explicitly requested reasoning
`vision`	Hard filter — images cannot be processed without it
`cache_control`	Soft — silently ignored, doesn't affect correctness
`json_schema`	Attempts `json_mode` fallback, warns in response

OpenRouter as Universal Fallback

OpenRouter is the default fallback for every model and every protocol (OpenAI, Anthropic, Gemini). This means:

New users only need a SandBase API key — OpenRouter handles all routing behind the scenes
Production users can add direct provider connections for lower latency or compliance requirements

When to Use Direct Providers

Scenario	Recommendation
Prototyping / development	OpenRouter only (simplest setup)
Latency-sensitive production	Add Anthropic/OpenAI direct connections
Enterprise compliance (no third-party proxy)	Direct connections required
High-volume cost optimization	Direct connections (avoid OpenRouter markup)
Accessing beta/preview models	Direct connections (OpenRouter may lag 1-2 weeks)

Model Aliases

SandBase supports model aliases for convenience:

Alias	Resolves To
`gpt-4o`	`openai/gpt-4o` (latest stable)
`claude-sonnet-4`	`anthropic/claude-sonnet-4`
`gemini-2.5-pro`	`google/gemini-2.5-pro-preview`

You can always use the full provider-prefixed name for explicit routing:

python

# These are equivalent
response = client.chat.completions.create(model="gpt-4o", ...)
response = client.chat.completions.create(model="openai/gpt-4o", ...)

Routing Strategy Comparison

SandBase supports multiple routing strategies that determine how requests are distributed across providers. Choose a strategy based on your application's priorities.

Strategy Overview

Strategy	Optimizes For	Trade-off	Best For
Priority-based (default)	Reliability + preference	May not be cheapest	Production apps with preferred providers
Cost-optimized	Lowest token cost	May have higher latency	High-volume batch processing
Latency-optimized	Fastest response time	May cost more	Real-time chat, interactive UIs
Quality-optimized	Best model output	Highest cost	Critical tasks (code review, legal)
Availability-optimized	Highest uptime	Less predictable cost	Mission-critical systems

Detailed Comparison

Dimension	Priority	Cost	Latency	Quality	Availability
Selection logic	Admin-defined order	Cheapest provider first	Lowest p50 latency first	Highest quality score first	Healthiest provider first
Fallback behavior	Next in priority list	Next cheapest	Next fastest	Next highest quality	Next most available
Configuration	Set provider priorities	Automatic	Automatic	Set quality scores	Automatic (health checks)
Predictability	High	Medium	Medium	High	Low
Typical latency	Low–Medium	Medium–High	Lowest	Medium	Variable
Typical cost	Medium	Lowest	Medium–High	Highest	Medium

Strategy Flow Diagram

mermaid

flowchart TD
    A[Incoming Request] --> B{Capability Filter}
    B -->|No capable provider| E[Return 400 Error]
    B -->|Candidates found| C{Routing Strategy}
    
    C -->|Priority| D1[Sort by admin-defined priority]
    C -->|Cost| D2[Sort by token price ascending]
    C -->|Latency| D3[Sort by p50 response time]
    C -->|Quality| D4[Sort by quality score descending]
    C -->|Availability| D5[Sort by health score descending]
    
    D1 --> F[Select Top Candidate]
    D2 --> F
    D3 --> F
    D4 --> F
    D5 --> F
    
    F --> G{Provider Responds?}
    G -->|Success| H[Return Response]
    G -->|Failure| I[Fallback to Next Candidate]
    I --> G

Smart Routing (Virtual Models)

SandBase provides virtual model names that automatically select the best real model for your request. Instead of specifying a model directly, use a routing strategy prefix:

Virtual Model	Strategy	Description
`sandbase/auto`	Balanced	Best quality-to-cost ratio across all models
`sandbase/fast`	Latency	Selects the fastest responding model
`sandbase/cheap`	Cost	Selects the cheapest capable model
`sandbase/best`	Quality	Selects the highest-quality model available

How Smart Routing Works

mermaid

sequenceDiagram
    participant App as Your App
    participant SB as SandBase
    participant Router as Route Engine
    participant Provider as LLM Provider

    App->>SB: POST /v1/chat/completions<br/>model: "sandbase/auto"
    SB->>Router: Resolve virtual model
    Router->>Router: Score candidates<br/>(quality / cost ratio)
    Router->>Router: Filter by capabilities
    Router-->>SB: Selected: claude-sonnet-4
    SB->>Provider: Forward request
    Provider-->>SB: Response
    SB-->>App: Response + headers<br/>x-sandbase-model: anthropic/claude-sonnet-4

Code Examples

Python — Using the OpenAI SDK

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sandbase.ai/v1",
    api_key="sk-sb-your-key"
)

# Direct model selection (priority-based routing across providers)
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Smart routing — let SandBase pick the best model
response = client.chat.completions.create(
    model="sandbase/auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Cost-optimized — cheapest model that can handle the task
response = client.chat.completions.create(
    model="sandbase/cheap",
    messages=[{"role": "user", "content": "Summarize this text: ..."}]
)

# Latency-optimized — fastest response
response = client.chat.completions.create(
    model="sandbase/fast",
    messages=[{"role": "user", "content": "Quick: what is 2+2?"}]
)

# Quality-optimized — best model regardless of cost
response = client.chat.completions.create(
    model="sandbase/best",
    messages=[{"role": "user", "content": "Review this contract for legal issues: ..."}]
)

# Check which model was actually used
print(response.model)  # e.g., "anthropic/claude-sonnet-4"

JavaScript — Using the OpenAI SDK

javascript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.sandbase.ai/v1",
  apiKey: "sk-sb-your-key",
});

// Smart routing with streaming
const stream = await client.chat.completions.create({
  model: "sandbase/auto",
  messages: [{ role: "user", content: "Write a haiku about routing" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

cURL — Direct API Call

bash

# Smart routing
curl -X POST https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sandbase/auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Check routing headers in the response
curl -v -X POST https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' 2>&1 | grep "x-sandbase"

# Response headers:
# x-sandbase-provider: openrouter
# x-sandbase-model: anthropic/claude-sonnet-4
# x-sandbase-route-time-ms: 3

Python — Anthropic SDK (via SandBase)

python

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.sandbase.ai",
    api_key="sk-sb-your-key"
)

# SandBase routes through the Anthropic-compatible endpoint
message = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Routing Decision Lifecycle

The complete lifecycle of a routing decision:

┌─────────────────────────────────────────────────────────────────────┐
│                        Request Arrives                                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  1. Authentication + Rate Limiting + Balance Check                    │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  2. Model Resolution                                                 │
│     • "claude-sonnet-4" → [OpenRouter, Anthropic Direct]             │
│     • "sandbase/auto"   → [All enabled LLM models]                   │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  3. Capability Filtering                                             │
│     • Detect: tools, vision, thinking, json_schema, etc.             │
│     • Remove providers missing required capabilities                 │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  4. Strategy-Based Ranking                                           │
│     • Priority: admin-defined order                                  │
│     • Cost: sort by token price                                      │
│     • Latency: sort by response time                                 │
│     • Quality: sort by quality score                                 │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  5. Execute + Fallback                                               │
│     • Try top candidate                                              │
│     • On failure: retry once → fallback to next candidate            │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  6. Response + Routing Metadata Headers                               │
│     • x-sandbase-provider, x-sandbase-model, x-sandbase-route-time   │
└─────────────────────────────────────────────────────────────────────┘

Best Practices

Choosing a Routing Strategy

Your Priority	Recommended Approach
Prototyping / exploration	Use `sandbase/auto` — zero config, good defaults
Cost-sensitive batch jobs	Use `sandbase/cheap` or specify a budget model directly
User-facing chat	Use `sandbase/fast` or specify a low-latency model (e.g., `gpt-4o-mini`)
High-stakes tasks	Use `sandbase/best` or specify a frontier model directly
Production with SLAs	Use direct model names + configure provider priorities

General Recommendations

Start with sandbase/auto during development. It gives you good quality at reasonable cost without locking into a specific model.
Use direct model names in production when you need predictable behavior. Smart routing is great for exploration, but production systems benefit from explicit model selection.
Configure provider priorities for your most-used models. If you have an Anthropic direct key, set it as priority 1 for Claude models to reduce latency.
Monitor routing headers to understand where your traffic goes. The x-sandbase-provider header tells you which provider served each request.
Set up fallback chains for critical paths. SandBase automatically falls back on provider failure, but you can configure the fallback order.
Use capability-aware requests — don't send tools or vision parameters unless needed. Extra capabilities narrow the candidate pool and may increase cost.

Cost Optimization Tips

Use sandbase/cheap for non-critical tasks (summarization, classification)
Avoid frontier models for simple tasks — gpt-4o-mini handles most routine work
Enable prompt caching (cache_control) for repeated system prompts
Monitor your routing distribution in the dashboard to spot cost outliers

Latency Optimization Tips

Add direct provider connections (Anthropic, OpenAI) to skip the OpenRouter hop
Use sandbase/fast for interactive applications
Enable streaming (stream: true) for perceived-latency improvement
Keep prompts concise — token count directly affects response time

Monitoring Routing Decisions

The response headers include routing metadata:

x-sandbase-provider: openrouter
x-sandbase-model: anthropic/claude-sonnet-4
x-sandbase-route-time-ms: 3

Use these headers to monitor which providers are serving your traffic and optimize accordingly.

Interpreting Routing Headers

Header	Description	Example Values
`x-sandbase-provider`	The provider that served the request	`openrouter`, `anthropic`, `openai`
`x-sandbase-model`	The full model identifier used	`anthropic/claude-sonnet-4`, `openai/gpt-4o`
`x-sandbase-route-time-ms`	Time spent on routing decision (ms)	`1`–`10` (typically < 5ms)

Observability in Practice

python

import openai

client = openai.OpenAI(
    base_url="https://api.sandbase.ai/v1",
    api_key="sk-sb-your-key"
)

# Use with_raw_response to access headers
with client.chat.completions.with_raw_response.create(
    model="sandbase/auto",
    messages=[{"role": "user", "content": "Hello!"}]
) as response:
    provider = response.headers.get("x-sandbase-provider")
    model = response.headers.get("x-sandbase-model")
    route_time = response.headers.get("x-sandbase-route-time-ms")
    
    print(f"Routed to {model} via {provider} in {route_time}ms")
    
    completion = response.parse()
    print(completion.choices[0].message.content)

Model Routing ​

How Routing Works ​

1. Model Resolution ​

2. Capability Filtering ​

3. Priority-Based Selection ​

Fallback Behavior ​

Automatic Retry vs Fallback ​

Capability-Aware Routing Errors ​

Capability Degradation Rules ​

OpenRouter as Universal Fallback ​

When to Use Direct Providers ​

Model Aliases ​

Routing Strategy Comparison ​

Strategy Overview ​

Detailed Comparison ​

Strategy Flow Diagram ​

Smart Routing (Virtual Models) ​

How Smart Routing Works ​

Code Examples ​

Python — Using the OpenAI SDK ​

JavaScript — Using the OpenAI SDK ​

cURL — Direct API Call ​

Python — Anthropic SDK (via SandBase) ​

Routing Decision Lifecycle ​

Best Practices ​

Choosing a Routing Strategy ​

General Recommendations ​

Cost Optimization Tips ​

Latency Optimization Tips ​

Monitoring Routing Decisions ​

Interpreting Routing Headers ​

Observability in Practice ​

Model Routing

How Routing Works

1. Model Resolution

2. Capability Filtering

3. Priority-Based Selection

Fallback Behavior

Automatic Retry vs Fallback

Capability-Aware Routing Errors

Capability Degradation Rules

OpenRouter as Universal Fallback

When to Use Direct Providers

Model Aliases

Routing Strategy Comparison

Strategy Overview

Detailed Comparison

Strategy Flow Diagram

Smart Routing (Virtual Models)

How Smart Routing Works

Code Examples

Python — Using the OpenAI SDK

JavaScript — Using the OpenAI SDK

cURL — Direct API Call

Python — Anthropic SDK (via SandBase)

Routing Decision Lifecycle

Best Practices

Choosing a Routing Strategy

General Recommendations

Cost Optimization Tips

Latency Optimization Tips

Monitoring Routing Decisions

Interpreting Routing Headers

Observability in Practice