Model Routing
SandBase acts as an intelligent gateway between your application and multiple LLM providers. When you send a request, SandBase determines the best provider to handle it based on model availability, capabilities, and your configuration.
How Routing Works
When a request arrives at SandBase, it goes through a multi-step routing pipeline:
Request → Authentication → Rate Limiting → Balance Check → Capability Filter → Route Selection → Provider1. Model Resolution
SandBase maps your requested model name to one or more provider endpoints. For example, claude-sonnet-4 might be available through:
- OpenRouter (default fallback)
- Anthropic Direct (lower latency, optional)
from openai import OpenAI
client = OpenAI(
base_url="https://api.sandbase.ai/v1",
api_key="sk-sb-your-key"
)
# SandBase resolves "claude-sonnet-4" to the best available provider
response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Hello!"}]
)2. Capability Filtering
SandBase inspects your request to determine which capabilities are required, then filters out providers that don't support them.
Capabilities detected from request parameters:
| Request Feature | Required Capability |
|---|---|
tools parameter present | tools |
| Image content in messages | vision |
reasoning or thinking parameter | thinking |
response_format: { type: "json_schema" } | json_schema |
cache_control on messages | cache_control |
stream: true | stream |
Example: If you send a request with tools and an image in the messages, SandBase only routes to providers that support both tools AND vision for that model.
# This request requires: tools + vision
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
],
tools=[{
"type": "function",
"function": {
"name": "describe_image",
"description": "Describe the image content",
"parameters": {"type": "object", "properties": {"description": {"type": "string"}}}
}
}]
)3. Priority-Based Selection
After filtering, SandBase selects from remaining candidates using priority-based routing:
- Direct providers (if configured) — lowest latency, no intermediary
- OpenRouter — universal fallback, supports all models
Priority can be configured per-model in the admin panel. The default behavior routes through OpenRouter unless a direct provider is configured with higher priority.
Fallback Behavior
If the primary provider fails (5xx error, timeout, or rate limit), SandBase automatically falls back to the next available provider:
Primary Provider (Anthropic Direct)
↓ fails
Fallback Provider (OpenRouter)
↓ fails
Error returned to clientAutomatic Retry vs Fallback
| Scenario | Behavior |
|---|---|
| Provider returns 5xx | Retry once, then fallback to next provider |
| Provider returns 429 (rate limited) | Immediate fallback to next provider |
| Provider timeout (no response in 60s) | Fallback to next provider |
| All providers fail | Return error to client with details |
Capability-Aware Routing Errors
If no provider supports all required capabilities for your request, SandBase returns a clear error:
{
"error": {
"type": "capability_unsupported",
"message": "No available provider supports all required capabilities for this request.",
"detail": {
"required_capabilities": ["thinking", "tools"],
"missing_for_all_candidates": ["thinking"]
}
}
}HTTP Status: 400 Bad Request
Capability Degradation Rules
| Capability | Behavior When Unsupported |
|---|---|
tools | Hard filter — request fails if no provider supports it |
thinking | Hard filter — user explicitly requested reasoning |
vision | Hard filter — images cannot be processed without it |
cache_control | Soft — silently ignored, doesn't affect correctness |
json_schema | Attempts json_mode fallback, warns in response |
OpenRouter as Universal Fallback
OpenRouter is the default fallback for every model and every protocol (OpenAI, Anthropic, Gemini). This means:
- New users only need a SandBase API key — OpenRouter handles all routing behind the scenes
- Production users can add direct provider connections for lower latency or compliance requirements
When to Use Direct Providers
| Scenario | Recommendation |
|---|---|
| Prototyping / development | OpenRouter only (simplest setup) |
| Latency-sensitive production | Add Anthropic/OpenAI direct connections |
| Enterprise compliance (no third-party proxy) | Direct connections required |
| High-volume cost optimization | Direct connections (avoid OpenRouter markup) |
| Accessing beta/preview models | Direct connections (OpenRouter may lag 1-2 weeks) |
Model Aliases
SandBase supports model aliases for convenience:
| Alias | Resolves To |
|---|---|
gpt-4o | openai/gpt-4o (latest stable) |
claude-sonnet-4 | anthropic/claude-sonnet-4 |
gemini-2.5-pro | google/gemini-2.5-pro-preview |
You can always use the full provider-prefixed name for explicit routing:
# These are equivalent
response = client.chat.completions.create(model="gpt-4o", ...)
response = client.chat.completions.create(model="openai/gpt-4o", ...)Routing Strategy Comparison
SandBase supports multiple routing strategies that determine how requests are distributed across providers. Choose a strategy based on your application's priorities.
Strategy Overview
| Strategy | Optimizes For | Trade-off | Best For |
|---|---|---|---|
| Priority-based (default) | Reliability + preference | May not be cheapest | Production apps with preferred providers |
| Cost-optimized | Lowest token cost | May have higher latency | High-volume batch processing |
| Latency-optimized | Fastest response time | May cost more | Real-time chat, interactive UIs |
| Quality-optimized | Best model output | Highest cost | Critical tasks (code review, legal) |
| Availability-optimized | Highest uptime | Less predictable cost | Mission-critical systems |
Detailed Comparison
| Dimension | Priority | Cost | Latency | Quality | Availability |
|---|---|---|---|---|---|
| Selection logic | Admin-defined order | Cheapest provider first | Lowest p50 latency first | Highest quality score first | Healthiest provider first |
| Fallback behavior | Next in priority list | Next cheapest | Next fastest | Next highest quality | Next most available |
| Configuration | Set provider priorities | Automatic | Automatic | Set quality scores | Automatic (health checks) |
| Predictability | High | Medium | Medium | High | Low |
| Typical latency | Low–Medium | Medium–High | Lowest | Medium | Variable |
| Typical cost | Medium | Lowest | Medium–High | Highest | Medium |
Strategy Flow Diagram
flowchart TD
A[Incoming Request] --> B{Capability Filter}
B -->|No capable provider| E[Return 400 Error]
B -->|Candidates found| C{Routing Strategy}
C -->|Priority| D1[Sort by admin-defined priority]
C -->|Cost| D2[Sort by token price ascending]
C -->|Latency| D3[Sort by p50 response time]
C -->|Quality| D4[Sort by quality score descending]
C -->|Availability| D5[Sort by health score descending]
D1 --> F[Select Top Candidate]
D2 --> F
D3 --> F
D4 --> F
D5 --> F
F --> G{Provider Responds?}
G -->|Success| H[Return Response]
G -->|Failure| I[Fallback to Next Candidate]
I --> GSmart Routing (Virtual Models)
SandBase provides virtual model names that automatically select the best real model for your request. Instead of specifying a model directly, use a routing strategy prefix:
| Virtual Model | Strategy | Description |
|---|---|---|
sandbase/auto | Balanced | Best quality-to-cost ratio across all models |
sandbase/fast | Latency | Selects the fastest responding model |
sandbase/cheap | Cost | Selects the cheapest capable model |
sandbase/best | Quality | Selects the highest-quality model available |
How Smart Routing Works
sequenceDiagram
participant App as Your App
participant SB as SandBase
participant Router as Route Engine
participant Provider as LLM Provider
App->>SB: POST /v1/chat/completions<br/>model: "sandbase/auto"
SB->>Router: Resolve virtual model
Router->>Router: Score candidates<br/>(quality / cost ratio)
Router->>Router: Filter by capabilities
Router-->>SB: Selected: claude-sonnet-4
SB->>Provider: Forward request
Provider-->>SB: Response
SB-->>App: Response + headers<br/>x-sandbase-model: anthropic/claude-sonnet-4Code Examples
Python — Using the OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.sandbase.ai/v1",
api_key="sk-sb-your-key"
)
# Direct model selection (priority-based routing across providers)
response = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Smart routing — let SandBase pick the best model
response = client.chat.completions.create(
model="sandbase/auto",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Cost-optimized — cheapest model that can handle the task
response = client.chat.completions.create(
model="sandbase/cheap",
messages=[{"role": "user", "content": "Summarize this text: ..."}]
)
# Latency-optimized — fastest response
response = client.chat.completions.create(
model="sandbase/fast",
messages=[{"role": "user", "content": "Quick: what is 2+2?"}]
)
# Quality-optimized — best model regardless of cost
response = client.chat.completions.create(
model="sandbase/best",
messages=[{"role": "user", "content": "Review this contract for legal issues: ..."}]
)
# Check which model was actually used
print(response.model) # e.g., "anthropic/claude-sonnet-4"JavaScript — Using the OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.sandbase.ai/v1",
apiKey: "sk-sb-your-key",
});
// Smart routing with streaming
const stream = await client.chat.completions.create({
model: "sandbase/auto",
messages: [{ role: "user", content: "Write a haiku about routing" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}cURL — Direct API Call
# Smart routing
curl -X POST https://api.sandbase.ai/v1/chat/completions \
-H "Authorization: Bearer sk-sb-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "sandbase/auto",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Check routing headers in the response
curl -v -X POST https://api.sandbase.ai/v1/chat/completions \
-H "Authorization: Bearer sk-sb-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4",
"messages": [{"role": "user", "content": "Hello!"}]
}' 2>&1 | grep "x-sandbase"
# Response headers:
# x-sandbase-provider: openrouter
# x-sandbase-model: anthropic/claude-sonnet-4
# x-sandbase-route-time-ms: 3Python — Anthropic SDK (via SandBase)
import anthropic
client = anthropic.Anthropic(
base_url="https://api.sandbase.ai",
api_key="sk-sb-your-key"
)
# SandBase routes through the Anthropic-compatible endpoint
message = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)Routing Decision Lifecycle
The complete lifecycle of a routing decision:
┌─────────────────────────────────────────────────────────────────────┐
│ Request Arrives │
└─────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ 1. Authentication + Rate Limiting + Balance Check │
└─────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ 2. Model Resolution │
│ • "claude-sonnet-4" → [OpenRouter, Anthropic Direct] │
│ • "sandbase/auto" → [All enabled LLM models] │
└─────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ 3. Capability Filtering │
│ • Detect: tools, vision, thinking, json_schema, etc. │
│ • Remove providers missing required capabilities │
└─────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ 4. Strategy-Based Ranking │
│ • Priority: admin-defined order │
│ • Cost: sort by token price │
│ • Latency: sort by response time │
│ • Quality: sort by quality score │
└─────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ 5. Execute + Fallback │
│ • Try top candidate │
│ • On failure: retry once → fallback to next candidate │
└─────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ 6. Response + Routing Metadata Headers │
│ • x-sandbase-provider, x-sandbase-model, x-sandbase-route-time │
└─────────────────────────────────────────────────────────────────────┘Best Practices
Choosing a Routing Strategy
| Your Priority | Recommended Approach |
|---|---|
| Prototyping / exploration | Use sandbase/auto — zero config, good defaults |
| Cost-sensitive batch jobs | Use sandbase/cheap or specify a budget model directly |
| User-facing chat | Use sandbase/fast or specify a low-latency model (e.g., gpt-4o-mini) |
| High-stakes tasks | Use sandbase/best or specify a frontier model directly |
| Production with SLAs | Use direct model names + configure provider priorities |
General Recommendations
Start with
sandbase/autoduring development. It gives you good quality at reasonable cost without locking into a specific model.Use direct model names in production when you need predictable behavior. Smart routing is great for exploration, but production systems benefit from explicit model selection.
Configure provider priorities for your most-used models. If you have an Anthropic direct key, set it as priority 1 for Claude models to reduce latency.
Monitor routing headers to understand where your traffic goes. The
x-sandbase-providerheader tells you which provider served each request.Set up fallback chains for critical paths. SandBase automatically falls back on provider failure, but you can configure the fallback order.
Use capability-aware requests — don't send
toolsorvisionparameters unless needed. Extra capabilities narrow the candidate pool and may increase cost.
Cost Optimization Tips
- Use
sandbase/cheapfor non-critical tasks (summarization, classification) - Avoid frontier models for simple tasks —
gpt-4o-minihandles most routine work - Enable prompt caching (
cache_control) for repeated system prompts - Monitor your routing distribution in the dashboard to spot cost outliers
Latency Optimization Tips
- Add direct provider connections (Anthropic, OpenAI) to skip the OpenRouter hop
- Use
sandbase/fastfor interactive applications - Enable streaming (
stream: true) for perceived-latency improvement - Keep prompts concise — token count directly affects response time
Monitoring Routing Decisions
The response headers include routing metadata:
x-sandbase-provider: openrouter
x-sandbase-model: anthropic/claude-sonnet-4
x-sandbase-route-time-ms: 3Use these headers to monitor which providers are serving your traffic and optimize accordingly.
Interpreting Routing Headers
| Header | Description | Example Values |
|---|---|---|
x-sandbase-provider | The provider that served the request | openrouter, anthropic, openai |
x-sandbase-model | The full model identifier used | anthropic/claude-sonnet-4, openai/gpt-4o |
x-sandbase-route-time-ms | Time spent on routing decision (ms) | 1–10 (typically < 5ms) |
Observability in Practice
import openai
client = openai.OpenAI(
base_url="https://api.sandbase.ai/v1",
api_key="sk-sb-your-key"
)
# Use with_raw_response to access headers
with client.chat.completions.with_raw_response.create(
model="sandbase/auto",
messages=[{"role": "user", "content": "Hello!"}]
) as response:
provider = response.headers.get("x-sandbase-provider")
model = response.headers.get("x-sandbase-model")
route_time = response.headers.get("x-sandbase-route-time-ms")
print(f"Routed to {model} via {provider} in {route_time}ms")
completion = response.parse()
print(completion.choices[0].message.content)
