Skip to content

Model Capabilities

Each model supports a different set of capabilities. SandBase uses this information for capability-aware routing — if your request requires a capability that a provider doesn't support, SandBase automatically routes to one that does.

Capability Matrix

ModelChatStreamingToolsVisionThinkingJSON ModeCache
openai/gpt-4o
openai/gpt-4o-mini
openai/o3
openai/o3-mini
anthropic/claude-sonnet-4
anthropic/claude-3.5-haiku
deepseek/deepseek-v3
deepseek/deepseek-chat
deepseek/deepseek-reasoner
google/gemini-2.5-pro
google/gemini-2.5-flash
meta/llama-4-maverick
alibaba/qwen3-32b
bytedance/seed-1.6

Capability Definitions

Chat

All models support basic chat completion — sending messages and receiving a response. This is the fundamental capability.

Streaming

Server-Sent Events (SSE) streaming for token-by-token response delivery. All models on SandBase support streaming.

See the Streaming Guide for implementation details.

Tools (Function Calling)

The ability to define functions that the model can call. The model returns structured tool call requests that your application executes.

python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

Models without tools support: deepseek/deepseek-reasoner (reasoning-only model)

Vision

The ability to process images in the input. Send images as URLs or base64-encoded data:

python
messages = [{
    "role": "user",
    "content": [
        {"type": "text", "text": "What's in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
    ]
}]

Models without vision: openai/o3-mini, deepseek/deepseek-v3, deepseek/deepseek-chat, deepseek/deepseek-reasoner, alibaba/qwen3-32b

Thinking (Reasoning)

Extended reasoning capabilities where the model "thinks" before responding. This produces higher-quality answers for complex problems at the cost of more tokens and latency.

How to enable:

python
response = client.chat.completions.create(
    model="openai/o3",
    messages=[{"role": "user", "content": "Solve this step by step..."}],
    reasoning_effort="high"  # low, medium, high
)
python
response = client.messages.create(
    model="anthropic/claude-sonnet-4",
    max_tokens=8000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    messages=[{"role": "user", "content": "Solve this step by step..."}]
)
python
response = client.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Solve this step by step..."}],
    extra_body={
        "gemini": {"thinkingBudget": 5000}
    }
)

Models with thinking: openai/o3, openai/o3-mini, anthropic/claude-sonnet-4, deepseek/deepseek-reasoner, google/gemini-2.5-pro, google/gemini-2.5-flash

JSON Mode

Guaranteed JSON output. The model is constrained to produce valid JSON:

python
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "List 3 colors as JSON"}],
    response_format={"type": "json_object"}
)
# Response is guaranteed to be valid JSON

For stricter control, use JSON Schema mode (supported by OpenAI and Gemini models):

python
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "List 3 colors"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "colors",
            "schema": {
                "type": "object",
                "properties": {
                    "colors": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["colors"]
            }
        }
    }
)

Models with JSON mode: openai/gpt-4o, openai/gpt-4o-mini, openai/o3, openai/o3-mini, deepseek/deepseek-v3, deepseek/deepseek-chat, google/gemini-2.5-pro, google/gemini-2.5-flash

Cache (Prompt Caching)

Prompt caching reduces costs for repeated prompts. Cached input tokens are billed at a significant discount:

ProviderCache Discount
Anthropic90% off input price
OpenAI50% off input price
DeepSeek90% off input price
GoogleVaries by model

How caching works:

  • OpenAI/DeepSeek: Automatic — the provider caches prompts transparently
  • Anthropic: Explicit — you mark cache breakpoints with cache_control
  • Google: Automatic with explicit context caching API

Models with caching: openai/gpt-4o, openai/gpt-4o-mini, anthropic/claude-sonnet-4, anthropic/claude-3.5-haiku, deepseek/deepseek-v3, deepseek/deepseek-chat, google/gemini-2.5-pro, google/gemini-2.5-flash

How Capabilities Affect Routing

When you send a request, SandBase inspects it to determine required capabilities:

Request FeatureRequired Capability
tools parametertools
Image in messagesvision
reasoning_effort or thinkingthinking
response_format: json_objectjson_mode
cache_control markerscache

If the requested model doesn't support a required capability through any available provider, SandBase returns a capability_unsupported error with details about what's missing.

Hard vs Soft Requirements

CapabilityTypeBehavior When Missing
ToolsHardRequest fails — cannot process tool calls
VisionHardRequest fails — cannot process images
ThinkingHardRequest fails — user explicitly requested reasoning
JSON ModeSoftFalls back to prompt-based JSON (with warning)
CacheSoftSilently ignored — request works, just no discount

Requesting New Models

If you need a model that's not currently available on SandBase, contact us at [email protected]. We regularly add new models based on user demand.