Skip to content

LLM Gateway — Chat Completions

SandBase exposes an OpenAI-compatible Chat Completions endpoint. Any application built with the OpenAI SDK can connect to SandBase by changing the base URL — no other code changes required.

Endpoint

POST https://api.sandbase.ai/v1/chat/completions

Authentication

http
Authorization: Bearer sk-sb-your-api-key

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesModel identifier (e.g., gpt-4o, claude-sonnet-4-20250514, deepseek-chat)
messagesarrayYesArray of message objects representing the conversation
temperaturenumberNo1.0Sampling temperature (0.0–2.0). Lower = more deterministic
top_pnumberNo1.0Nucleus sampling threshold
max_tokensintegerNoModel defaultMaximum tokens to generate
streambooleanNofalseEnable streaming via Server-Sent Events
stopstring or arrayNonullStop sequence(s) to halt generation
toolsarrayNonullTool/function definitions for function calling
tool_choicestring or objectNo"auto"Controls tool selection behavior
response_formatobjectNonullForce structured output (JSON mode or JSON schema)
nintegerNo1Number of completions to generate
presence_penaltynumberNo0Penalize tokens based on presence in text so far (-2.0–2.0)
frequency_penaltynumberNo0Penalize tokens based on frequency in text so far (-2.0–2.0)
userstringNonullUnique user identifier for abuse monitoring
stream_optionsobjectNonullOptions for streaming (e.g., include_usage)

Messages Array

Each message object has the following structure:

FieldTypeRequiredDescription
rolestringYesOne of: system, user, assistant, tool
contentstring or arrayYesMessage content (text or content parts array)
namestringNoOptional name for the participant
tool_callsarrayNoTool calls made by the assistant (assistant messages only)
tool_call_idstringNoID of the tool call this message responds to (tool messages only)

Content Parts (Multimodal)

When content is an array, each element is a content part:

json
[
  { "type": "text", "text": "What's in this image?" },
  { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
]

Tools Array

json
[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string", "description": "City name" }
        },
        "required": ["location"]
      }
    }
  }
]

Tool Choice

ValueBehavior
"auto"Model decides whether to call tools
"none"Model will not call any tools
"required"Model must call at least one tool
{"type": "function", "function": {"name": "..."}}Force a specific tool

Response Format

json
{ "type": "json_object" }

or with a schema:

json
{
  "type": "json_schema",
  "json_schema": {
    "name": "my_schema",
    "strict": true,
    "schema": { ... }
  }
}

Response (Non-Streaming)

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1719849600,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Response Fields

FieldTypeDescription
idstringUnique completion identifier
objectstringAlways "chat.completion"
createdintegerUnix timestamp of creation
modelstringModel used for generation
choicesarrayArray of completion choices
choices[].indexintegerChoice index
choices[].messageobjectGenerated message
choices[].finish_reasonstringWhy generation stopped
usageobjectToken usage statistics

Finish Reasons

ValueDescription
stopNatural stop or hit a stop sequence
lengthHit max_tokens limit
tool_callsModel invoked one or more tools
content_filterContent was filtered

Response (Streaming)

When stream: true, the response is delivered as Server-Sent Events (SSE):

Content-Type: text/event-stream

Each event contains a chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Stream Chunk Fields

FieldTypeDescription
idstringSame ID across all chunks in a stream
objectstringAlways "chat.completion.chunk"
choices[].deltaobjectIncremental content (may contain role, content, tool_calls)
choices[].finish_reasonstring or nullSet on the final chunk
usageobject or nullIncluded in final chunk when stream_options.include_usage is true

Stream Usage

To receive token usage in streaming mode, include:

json
{
  "stream": true,
  "stream_options": { "include_usage": true }
}

The final chunk before [DONE] will include the usage object.

Supported Models

SandBase routes requests to the optimal provider for each model. With 1400+ models available, here are some popular choices:

ModelProviderContext WindowCapabilities
gpt-4oOpenAI128Kchat, tools, vision, JSON mode
gpt-4o-miniOpenAI128Kchat, tools, vision, JSON mode
o3OpenAI200Kchat, tools, reasoning
o3-miniOpenAI200Kchat, tools, reasoning
claude-sonnet-4-20250514Anthropic200Kchat, tools, vision, thinking
claude-3-5-haiku-20241022Anthropic200Kchat, tools, vision
deepseek-chatDeepSeek64Kchat, tools
deepseek-reasonerDeepSeek64Kchat, reasoning
gemini-2.5-proGoogle1Mchat, tools, vision, thinking
gemini-2.5-flashGoogle1Mchat, tools, vision, thinking

See the Models page for the complete list with pricing.

Code Examples

Basic Chat Completion

bash
curl https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'
python
from openai import OpenAI

client = OpenAI(
    api_key="sk-sb-your-key",
    base_url="https://api.sandbase.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=256
)

print(response.choices[0].message.content)
javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-sb-your-key',
  baseURL: 'https://api.sandbase.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Streaming

bash
curl https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Write a haiku about programming."}
    ],
    "stream": true
  }'
python
from openai import OpenAI

client = OpenAI(
    api_key="sk-sb-your-key",
    base_url="https://api.sandbase.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a haiku about programming."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-sb-your-key',
  baseURL: 'https://api.sandbase.ai/v1',
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'Write a haiku about programming.' },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Function Calling

bash
curl https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"},
              "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'
python
from openai import OpenAI
import json

client = OpenAI(
    api_key="sk-sb-your-key",
    base_url="https://api.sandbase.ai/v1"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

# Handle tool call
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")
javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-sb-your-key',
  baseURL: 'https://api.sandbase.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: "What's the weather in Tokyo?" }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a city',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string', description: 'City name' },
            unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
          },
          required: ['location'],
        },
      },
    },
  ],
  tool_choice: 'auto',
});

const toolCall = response.choices[0].message.tool_calls[0];
console.log(`Function: ${toolCall.function.name}`);
console.log(`Arguments: ${toolCall.function.arguments}`);

JSON Mode

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Return data as JSON."},
        {"role": "user", "content": "List 3 programming languages with their year of creation."}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
javascript
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Return data as JSON.' },
    { role: 'user', content: 'List 3 programming languages with their year of creation.' },
  ],
  response_format: { type: 'json_object' },
});

const data = JSON.parse(response.choices[0].message.content);

Notes

  • Model routing: SandBase automatically routes to the best available provider for the requested model. If a provider is unavailable, requests are retried on fallback providers.
  • Reasoning models: Models like o3 and deepseek-reasoner may include reasoning_content in the response delta during streaming.
  • Rate limits: See Rate Limiting for details on handling 429 responses.
  • Billing: Token usage is metered per request. See Billing for pricing details.