Skip to content

Model Routing

SandBase acts as an intelligent gateway between your application and multiple LLM providers. When you send a request, SandBase determines the best provider to handle it based on model availability, capabilities, and your configuration.

How Routing Works

When a request arrives at SandBase, it goes through a multi-step routing pipeline:

Request → Authentication → Rate Limiting → Balance Check → Capability Filter → Route Selection → Provider

1. Model Resolution

SandBase maps your requested model name to one or more provider endpoints. For example, claude-sonnet-4 might be available through:

  • OpenRouter (default fallback)
  • Anthropic Direct (lower latency, optional)
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sandbase.ai/v1",
    api_key="sk-sb-your-key"
)

# SandBase resolves "claude-sonnet-4" to the best available provider
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

2. Capability Filtering

SandBase inspects your request to determine which capabilities are required, then filters out providers that don't support them.

Capabilities detected from request parameters:

Request FeatureRequired Capability
tools parameter presenttools
Image content in messagesvision
reasoning or thinking parameterthinking
response_format: { type: "json_schema" }json_schema
cache_control on messagescache_control
stream: truestream

Example: If you send a request with tools and an image in the messages, SandBase only routes to providers that support both tools AND vision for that model.

python
# This request requires: tools + vision
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
            ]
        }
    ],
    tools=[{
        "type": "function",
        "function": {
            "name": "describe_image",
            "description": "Describe the image content",
            "parameters": {"type": "object", "properties": {"description": {"type": "string"}}}
        }
    }]
)

3. Priority-Based Selection

After filtering, SandBase selects from remaining candidates using priority-based routing:

  1. Direct providers (if configured) — lowest latency, no intermediary
  2. OpenRouter — universal fallback, supports all models

Priority can be configured per-model in the admin panel. The default behavior routes through OpenRouter unless a direct provider is configured with higher priority.

Fallback Behavior

If the primary provider fails (5xx error, timeout, or rate limit), SandBase automatically falls back to the next available provider:

Primary Provider (Anthropic Direct)
    ↓ fails
Fallback Provider (OpenRouter)
    ↓ fails
Error returned to client

Automatic Retry vs Fallback

ScenarioBehavior
Provider returns 5xxRetry once, then fallback to next provider
Provider returns 429 (rate limited)Immediate fallback to next provider
Provider timeout (no response in 60s)Fallback to next provider
All providers failReturn error to client with details

Capability-Aware Routing Errors

If no provider supports all required capabilities for your request, SandBase returns a clear error:

json
{
  "error": {
    "type": "capability_unsupported",
    "message": "No available provider supports all required capabilities for this request.",
    "detail": {
      "required_capabilities": ["thinking", "tools"],
      "missing_for_all_candidates": ["thinking"]
    }
  }
}

HTTP Status: 400 Bad Request

Capability Degradation Rules

CapabilityBehavior When Unsupported
toolsHard filter — request fails if no provider supports it
thinkingHard filter — user explicitly requested reasoning
visionHard filter — images cannot be processed without it
cache_controlSoft — silently ignored, doesn't affect correctness
json_schemaAttempts json_mode fallback, warns in response

OpenRouter as Universal Fallback

OpenRouter is the default fallback for every model and every protocol (OpenAI, Anthropic, Gemini). This means:

  • New users only need a SandBase API key — OpenRouter handles all routing behind the scenes
  • Production users can add direct provider connections for lower latency or compliance requirements

When to Use Direct Providers

ScenarioRecommendation
Prototyping / developmentOpenRouter only (simplest setup)
Latency-sensitive productionAdd Anthropic/OpenAI direct connections
Enterprise compliance (no third-party proxy)Direct connections required
High-volume cost optimizationDirect connections (avoid OpenRouter markup)
Accessing beta/preview modelsDirect connections (OpenRouter may lag 1-2 weeks)

Model Aliases

SandBase supports model aliases for convenience:

AliasResolves To
gpt-4oopenai/gpt-4o (latest stable)
claude-sonnet-4anthropic/claude-sonnet-4
gemini-2.5-progoogle/gemini-2.5-pro-preview

You can always use the full provider-prefixed name for explicit routing:

python
# These are equivalent
response = client.chat.completions.create(model="gpt-4o", ...)
response = client.chat.completions.create(model="openai/gpt-4o", ...)

Routing Strategy Comparison

SandBase supports multiple routing strategies that determine how requests are distributed across providers. Choose a strategy based on your application's priorities.

Strategy Overview

StrategyOptimizes ForTrade-offBest For
Priority-based (default)Reliability + preferenceMay not be cheapestProduction apps with preferred providers
Cost-optimizedLowest token costMay have higher latencyHigh-volume batch processing
Latency-optimizedFastest response timeMay cost moreReal-time chat, interactive UIs
Quality-optimizedBest model outputHighest costCritical tasks (code review, legal)
Availability-optimizedHighest uptimeLess predictable costMission-critical systems

Detailed Comparison

DimensionPriorityCostLatencyQualityAvailability
Selection logicAdmin-defined orderCheapest provider firstLowest p50 latency firstHighest quality score firstHealthiest provider first
Fallback behaviorNext in priority listNext cheapestNext fastestNext highest qualityNext most available
ConfigurationSet provider prioritiesAutomaticAutomaticSet quality scoresAutomatic (health checks)
PredictabilityHighMediumMediumHighLow
Typical latencyLow–MediumMedium–HighLowestMediumVariable
Typical costMediumLowestMedium–HighHighestMedium

Strategy Flow Diagram

mermaid
flowchart TD
    A[Incoming Request] --> B{Capability Filter}
    B -->|No capable provider| E[Return 400 Error]
    B -->|Candidates found| C{Routing Strategy}
    
    C -->|Priority| D1[Sort by admin-defined priority]
    C -->|Cost| D2[Sort by token price ascending]
    C -->|Latency| D3[Sort by p50 response time]
    C -->|Quality| D4[Sort by quality score descending]
    C -->|Availability| D5[Sort by health score descending]
    
    D1 --> F[Select Top Candidate]
    D2 --> F
    D3 --> F
    D4 --> F
    D5 --> F
    
    F --> G{Provider Responds?}
    G -->|Success| H[Return Response]
    G -->|Failure| I[Fallback to Next Candidate]
    I --> G

Smart Routing (Virtual Models)

SandBase provides virtual model names that automatically select the best real model for your request. Instead of specifying a model directly, use a routing strategy prefix:

Virtual ModelStrategyDescription
sandbase/autoBalancedBest quality-to-cost ratio across all models
sandbase/fastLatencySelects the fastest responding model
sandbase/cheapCostSelects the cheapest capable model
sandbase/bestQualitySelects the highest-quality model available

How Smart Routing Works

mermaid
sequenceDiagram
    participant App as Your App
    participant SB as SandBase
    participant Router as Route Engine
    participant Provider as LLM Provider

    App->>SB: POST /v1/chat/completions<br/>model: "sandbase/auto"
    SB->>Router: Resolve virtual model
    Router->>Router: Score candidates<br/>(quality / cost ratio)
    Router->>Router: Filter by capabilities
    Router-->>SB: Selected: claude-sonnet-4
    SB->>Provider: Forward request
    Provider-->>SB: Response
    SB-->>App: Response + headers<br/>x-sandbase-model: anthropic/claude-sonnet-4

Code Examples

Python — Using the OpenAI SDK

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sandbase.ai/v1",
    api_key="sk-sb-your-key"
)

# Direct model selection (priority-based routing across providers)
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Smart routing — let SandBase pick the best model
response = client.chat.completions.create(
    model="sandbase/auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Cost-optimized — cheapest model that can handle the task
response = client.chat.completions.create(
    model="sandbase/cheap",
    messages=[{"role": "user", "content": "Summarize this text: ..."}]
)

# Latency-optimized — fastest response
response = client.chat.completions.create(
    model="sandbase/fast",
    messages=[{"role": "user", "content": "Quick: what is 2+2?"}]
)

# Quality-optimized — best model regardless of cost
response = client.chat.completions.create(
    model="sandbase/best",
    messages=[{"role": "user", "content": "Review this contract for legal issues: ..."}]
)

# Check which model was actually used
print(response.model)  # e.g., "anthropic/claude-sonnet-4"

JavaScript — Using the OpenAI SDK

javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.sandbase.ai/v1",
  apiKey: "sk-sb-your-key",
});

// Smart routing with streaming
const stream = await client.chat.completions.create({
  model: "sandbase/auto",
  messages: [{ role: "user", content: "Write a haiku about routing" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

cURL — Direct API Call

bash
# Smart routing
curl -X POST https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sandbase/auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Check routing headers in the response
curl -v -X POST https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' 2>&1 | grep "x-sandbase"

# Response headers:
# x-sandbase-provider: openrouter
# x-sandbase-model: anthropic/claude-sonnet-4
# x-sandbase-route-time-ms: 3

Python — Anthropic SDK (via SandBase)

python
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.sandbase.ai",
    api_key="sk-sb-your-key"
)

# SandBase routes through the Anthropic-compatible endpoint
message = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Routing Decision Lifecycle

The complete lifecycle of a routing decision:

┌─────────────────────────────────────────────────────────────────────┐
│                        Request Arrives                                │
└─────────────────────────────┬───────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│  1. Authentication + Rate Limiting + Balance Check                    │
└─────────────────────────────┬───────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│  2. Model Resolution                                                 │
│     • "claude-sonnet-4" → [OpenRouter, Anthropic Direct]             │
│     • "sandbase/auto"   → [All enabled LLM models]                   │
└─────────────────────────────┬───────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│  3. Capability Filtering                                             │
│     • Detect: tools, vision, thinking, json_schema, etc.             │
│     • Remove providers missing required capabilities                 │
└─────────────────────────────┬───────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│  4. Strategy-Based Ranking                                           │
│     • Priority: admin-defined order                                  │
│     • Cost: sort by token price                                      │
│     • Latency: sort by response time                                 │
│     • Quality: sort by quality score                                 │
└─────────────────────────────┬───────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│  5. Execute + Fallback                                               │
│     • Try top candidate                                              │
│     • On failure: retry once → fallback to next candidate            │
└─────────────────────────────┬───────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│  6. Response + Routing Metadata Headers                               │
│     • x-sandbase-provider, x-sandbase-model, x-sandbase-route-time   │
└─────────────────────────────────────────────────────────────────────┘

Best Practices

Choosing a Routing Strategy

Your PriorityRecommended Approach
Prototyping / explorationUse sandbase/auto — zero config, good defaults
Cost-sensitive batch jobsUse sandbase/cheap or specify a budget model directly
User-facing chatUse sandbase/fast or specify a low-latency model (e.g., gpt-4o-mini)
High-stakes tasksUse sandbase/best or specify a frontier model directly
Production with SLAsUse direct model names + configure provider priorities

General Recommendations

  1. Start with sandbase/auto during development. It gives you good quality at reasonable cost without locking into a specific model.

  2. Use direct model names in production when you need predictable behavior. Smart routing is great for exploration, but production systems benefit from explicit model selection.

  3. Configure provider priorities for your most-used models. If you have an Anthropic direct key, set it as priority 1 for Claude models to reduce latency.

  4. Monitor routing headers to understand where your traffic goes. The x-sandbase-provider header tells you which provider served each request.

  5. Set up fallback chains for critical paths. SandBase automatically falls back on provider failure, but you can configure the fallback order.

  6. Use capability-aware requests — don't send tools or vision parameters unless needed. Extra capabilities narrow the candidate pool and may increase cost.

Cost Optimization Tips

  • Use sandbase/cheap for non-critical tasks (summarization, classification)
  • Avoid frontier models for simple tasks — gpt-4o-mini handles most routine work
  • Enable prompt caching (cache_control) for repeated system prompts
  • Monitor your routing distribution in the dashboard to spot cost outliers

Latency Optimization Tips

  • Add direct provider connections (Anthropic, OpenAI) to skip the OpenRouter hop
  • Use sandbase/fast for interactive applications
  • Enable streaming (stream: true) for perceived-latency improvement
  • Keep prompts concise — token count directly affects response time

Monitoring Routing Decisions

The response headers include routing metadata:

x-sandbase-provider: openrouter
x-sandbase-model: anthropic/claude-sonnet-4
x-sandbase-route-time-ms: 3

Use these headers to monitor which providers are serving your traffic and optimize accordingly.

Interpreting Routing Headers

HeaderDescriptionExample Values
x-sandbase-providerThe provider that served the requestopenrouter, anthropic, openai
x-sandbase-modelThe full model identifier usedanthropic/claude-sonnet-4, openai/gpt-4o
x-sandbase-route-time-msTime spent on routing decision (ms)110 (typically < 5ms)

Observability in Practice

python
import openai

client = openai.OpenAI(
    base_url="https://api.sandbase.ai/v1",
    api_key="sk-sb-your-key"
)

# Use with_raw_response to access headers
with client.chat.completions.with_raw_response.create(
    model="sandbase/auto",
    messages=[{"role": "user", "content": "Hello!"}]
) as response:
    provider = response.headers.get("x-sandbase-provider")
    model = response.headers.get("x-sandbase-model")
    route_time = response.headers.get("x-sandbase-route-time-ms")
    
    print(f"Routed to {model} via {provider} in {route_time}ms")
    
    completion = response.parse()
    print(completion.choices[0].message.content)