LLM Gateway — Chat Completions
SandBase exposes an OpenAI-compatible Chat Completions endpoint. Any application built with the OpenAI SDK can connect to SandBase by changing the base URL — no other code changes required.
Endpoint
POST https://api.sandbase.ai/v1/chat/completionsAuthentication
http
Authorization: Bearer sk-sb-your-api-keyRequest Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model identifier (e.g., gpt-4o, claude-sonnet-4-20250514, deepseek-chat) |
messages | array | Yes | — | Array of message objects representing the conversation |
temperature | number | No | 1.0 | Sampling temperature (0.0–2.0). Lower = more deterministic |
top_p | number | No | 1.0 | Nucleus sampling threshold |
max_tokens | integer | No | Model default | Maximum tokens to generate |
stream | boolean | No | false | Enable streaming via Server-Sent Events |
stop | string or array | No | null | Stop sequence(s) to halt generation |
tools | array | No | null | Tool/function definitions for function calling |
tool_choice | string or object | No | "auto" | Controls tool selection behavior |
response_format | object | No | null | Force structured output (JSON mode or JSON schema) |
n | integer | No | 1 | Number of completions to generate |
presence_penalty | number | No | 0 | Penalize tokens based on presence in text so far (-2.0–2.0) |
frequency_penalty | number | No | 0 | Penalize tokens based on frequency in text so far (-2.0–2.0) |
user | string | No | null | Unique user identifier for abuse monitoring |
stream_options | object | No | null | Options for streaming (e.g., include_usage) |
Messages Array
Each message object has the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of: system, user, assistant, tool |
content | string or array | Yes | Message content (text or content parts array) |
name | string | No | Optional name for the participant |
tool_calls | array | No | Tool calls made by the assistant (assistant messages only) |
tool_call_id | string | No | ID of the tool call this message responds to (tool messages only) |
Content Parts (Multimodal)
When content is an array, each element is a content part:
json
[
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
]Tools Array
json
[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string", "description": "City name" }
},
"required": ["location"]
}
}
}
]Tool Choice
| Value | Behavior |
|---|---|
"auto" | Model decides whether to call tools |
"none" | Model will not call any tools |
"required" | Model must call at least one tool |
{"type": "function", "function": {"name": "..."}} | Force a specific tool |
Response Format
json
{ "type": "json_object" }or with a schema:
json
{
"type": "json_schema",
"json_schema": {
"name": "my_schema",
"strict": true,
"schema": { ... }
}
}Response (Non-Streaming)
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1719849600,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique completion identifier |
object | string | Always "chat.completion" |
created | integer | Unix timestamp of creation |
model | string | Model used for generation |
choices | array | Array of completion choices |
choices[].index | integer | Choice index |
choices[].message | object | Generated message |
choices[].finish_reason | string | Why generation stopped |
usage | object | Token usage statistics |
Finish Reasons
| Value | Description |
|---|---|
stop | Natural stop or hit a stop sequence |
length | Hit max_tokens limit |
tool_calls | Model invoked one or more tools |
content_filter | Content was filtered |
Response (Streaming)
When stream: true, the response is delivered as Server-Sent Events (SSE):
Content-Type: text/event-streamEach event contains a chunk:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Stream Chunk Fields
| Field | Type | Description |
|---|---|---|
id | string | Same ID across all chunks in a stream |
object | string | Always "chat.completion.chunk" |
choices[].delta | object | Incremental content (may contain role, content, tool_calls) |
choices[].finish_reason | string or null | Set on the final chunk |
usage | object or null | Included in final chunk when stream_options.include_usage is true |
Stream Usage
To receive token usage in streaming mode, include:
json
{
"stream": true,
"stream_options": { "include_usage": true }
}The final chunk before [DONE] will include the usage object.
Supported Models
SandBase routes requests to the optimal provider for each model. With 1400+ models available, here are some popular choices:
| Model | Provider | Context Window | Capabilities |
|---|---|---|---|
gpt-4o | OpenAI | 128K | chat, tools, vision, JSON mode |
gpt-4o-mini | OpenAI | 128K | chat, tools, vision, JSON mode |
o3 | OpenAI | 200K | chat, tools, reasoning |
o3-mini | OpenAI | 200K | chat, tools, reasoning |
claude-sonnet-4-20250514 | Anthropic | 200K | chat, tools, vision, thinking |
claude-3-5-haiku-20241022 | Anthropic | 200K | chat, tools, vision |
deepseek-chat | DeepSeek | 64K | chat, tools |
deepseek-reasoner | DeepSeek | 64K | chat, reasoning |
gemini-2.5-pro | 1M | chat, tools, vision, thinking | |
gemini-2.5-flash | 1M | chat, tools, vision, thinking |
See the Models page for the complete list with pricing.
Code Examples
Basic Chat Completion
bash
curl https://api.sandbase.ai/v1/chat/completions \
-H "Authorization: Bearer sk-sb-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 256
}'python
from openai import OpenAI
client = OpenAI(
api_key="sk-sb-your-key",
base_url="https://api.sandbase.ai/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7,
max_tokens=256
)
print(response.choices[0].message.content)javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-sb-your-key',
baseURL: 'https://api.sandbase.ai/v1',
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' },
],
temperature: 0.7,
max_tokens: 256,
});
console.log(response.choices[0].message.content);Streaming
bash
curl https://api.sandbase.ai/v1/chat/completions \
-H "Authorization: Bearer sk-sb-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Write a haiku about programming."}
],
"stream": true
}'python
from openai import OpenAI
client = OpenAI(
api_key="sk-sb-your-key",
base_url="https://api.sandbase.ai/v1"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Write a haiku about programming."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-sb-your-key',
baseURL: 'https://api.sandbase.ai/v1',
});
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Write a haiku about programming.' },
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}Function Calling
bash
curl https://api.sandbase.ai/v1/chat/completions \
-H "Authorization: Bearer sk-sb-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'python
from openai import OpenAI
import json
client = OpenAI(
api_key="sk-sb-your-key",
base_url="https://api.sandbase.ai/v1"
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto"
)
# Handle tool call
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-sb-your-key',
baseURL: 'https://api.sandbase.ai/v1',
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: "What's the weather in Tokyo?" }],
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a city',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' },
unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
},
},
],
tool_choice: 'auto',
});
const toolCall = response.choices[0].message.tool_calls[0];
console.log(`Function: ${toolCall.function.name}`);
console.log(`Arguments: ${toolCall.function.arguments}`);JSON Mode
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Return data as JSON."},
{"role": "user", "content": "List 3 programming languages with their year of creation."}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)javascript
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Return data as JSON.' },
{ role: 'user', content: 'List 3 programming languages with their year of creation.' },
],
response_format: { type: 'json_object' },
});
const data = JSON.parse(response.choices[0].message.content);Notes
- Model routing: SandBase automatically routes to the best available provider for the requested model. If a provider is unavailable, requests are retried on fallback providers.
- Reasoning models: Models like
o3anddeepseek-reasonermay includereasoning_contentin the response delta during streaming. - Rate limits: See Rate Limiting for details on handling 429 responses.
- Billing: Token usage is metered per request. See Billing for pricing details.

