LLM Gateway — Chat Completions

SandBase exposes an OpenAI-compatible Chat Completions endpoint. Any application built with the OpenAI SDK can connect to SandBase by changing the base URL — no other code changes required.

Endpoint

POST https://api.sandbase.ai/v1/chat/completions

Authentication

http

Authorization: Bearer sk-sb-your-api-key

Request Body

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Model identifier (e.g., `gpt-4o`, `claude-sonnet-4-20250514`, `deepseek-chat`)
`messages`	array	Yes	—	Array of message objects representing the conversation
`temperature`	number	No	1.0	Sampling temperature (0.0–2.0). Lower = more deterministic
`top_p`	number	No	1.0	Nucleus sampling threshold
`max_tokens`	integer	No	Model default	Maximum tokens to generate
`stream`	boolean	No	`false`	Enable streaming via Server-Sent Events
`stop`	string or array	No	`null`	Stop sequence(s) to halt generation
`tools`	array	No	`null`	Tool/function definitions for function calling
`tool_choice`	string or object	No	`"auto"`	Controls tool selection behavior
`response_format`	object	No	`null`	Force structured output (JSON mode or JSON schema)
`n`	integer	No	1	Number of completions to generate
`presence_penalty`	number	No	0	Penalize tokens based on presence in text so far (-2.0–2.0)
`frequency_penalty`	number	No	0	Penalize tokens based on frequency in text so far (-2.0–2.0)
`user`	string	No	`null`	Unique user identifier for abuse monitoring
`stream_options`	object	No	`null`	Options for streaming (e.g., `include_usage`)

Messages Array

Each message object has the following structure:

Field	Type	Required	Description
`role`	string	Yes	One of: `system`, `user`, `assistant`, `tool`
`content`	string or array	Yes	Message content (text or content parts array)
`name`	string	No	Optional name for the participant
`tool_calls`	array	No	Tool calls made by the assistant (assistant messages only)
`tool_call_id`	string	No	ID of the tool call this message responds to (tool messages only)

Content Parts (Multimodal)

When content is an array, each element is a content part:

json

[
  { "type": "text", "text": "What's in this image?" },
  { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
]

Tools Array

json

[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string", "description": "City name" }
        },
        "required": ["location"]
      }
    }
  }
]

Tool Choice

Value	Behavior
`"auto"`	Model decides whether to call tools
`"none"`	Model will not call any tools
`"required"`	Model must call at least one tool
`{"type": "function", "function": {"name": "..."}}`	Force a specific tool

Response Format

json

{ "type": "json_object" }

or with a schema:

json

{
  "type": "json_schema",
  "json_schema": {
    "name": "my_schema",
    "strict": true,
    "schema": { ... }
  }
}

Response (Non-Streaming)

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1719849600,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique completion identifier
`object`	string	Always `"chat.completion"`
`created`	integer	Unix timestamp of creation
`model`	string	Model used for generation
`choices`	array	Array of completion choices
`choices[].index`	integer	Choice index
`choices[].message`	object	Generated message
`choices[].finish_reason`	string	Why generation stopped
`usage`	object	Token usage statistics

Finish Reasons

Value	Description
`stop`	Natural stop or hit a stop sequence
`length`	Hit `max_tokens` limit
`tool_calls`	Model invoked one or more tools
`content_filter`	Content was filtered

Response (Streaming)

When stream: true, the response is delivered as Server-Sent Events (SSE):

Content-Type: text/event-stream

Each event contains a chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1719849600,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Stream Chunk Fields

Field	Type	Description
`id`	string	Same ID across all chunks in a stream
`object`	string	Always `"chat.completion.chunk"`
`choices[].delta`	object	Incremental content (may contain `role`, `content`, `tool_calls`)
`choices[].finish_reason`	string or null	Set on the final chunk
`usage`	object or null	Included in final chunk when `stream_options.include_usage` is true

Stream Usage

To receive token usage in streaming mode, include:

json

{
  "stream": true,
  "stream_options": { "include_usage": true }
}

The final chunk before [DONE] will include the usage object.

Supported Models

SandBase routes requests to the optimal provider for each model. With 1400+ models available, here are some popular choices:

Model	Provider	Context Window	Capabilities
`gpt-4o`	OpenAI	128K	chat, tools, vision, JSON mode
`gpt-4o-mini`	OpenAI	128K	chat, tools, vision, JSON mode
`o3`	OpenAI	200K	chat, tools, reasoning
`o3-mini`	OpenAI	200K	chat, tools, reasoning
`claude-sonnet-4-20250514`	Anthropic	200K	chat, tools, vision, thinking
`claude-3-5-haiku-20241022`	Anthropic	200K	chat, tools, vision
`deepseek-chat`	DeepSeek	64K	chat, tools
`deepseek-reasoner`	DeepSeek	64K	chat, reasoning
`gemini-2.5-pro`	Google	1M	chat, tools, vision, thinking
`gemini-2.5-flash`	Google	1M	chat, tools, vision, thinking

See the Models page for the complete list with pricing.

Code Examples

Basic Chat Completion

cURLPythonJavaScript

bash

curl https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-sb-your-key",
    base_url="https://api.sandbase.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    temperature=0.7,
    max_tokens=256
)

print(response.choices[0].message.content)

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-sb-your-key',
  baseURL: 'https://api.sandbase.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Streaming

cURLPythonJavaScript

bash

curl https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Write a haiku about programming."}
    ],
    "stream": true
  }'

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-sb-your-key",
    base_url="https://api.sandbase.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a haiku about programming."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-sb-your-key',
  baseURL: 'https://api.sandbase.ai/v1',
});

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'Write a haiku about programming.' },
  ],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Function Calling

cURLPythonJavaScript

bash

curl https://api.sandbase.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-sb-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a city",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"},
              "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

python

from openai import OpenAI
import json

client = OpenAI(
    api_key="sk-sb-your-key",
    base_url="https://api.sandbase.ai/v1"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

# Handle tool call
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-sb-your-key',
  baseURL: 'https://api.sandbase.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: "What's the weather in Tokyo?" }],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a city',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string', description: 'City name' },
            unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
          },
          required: ['location'],
        },
      },
    },
  ],
  tool_choice: 'auto',
});

const toolCall = response.choices[0].message.tool_calls[0];
console.log(`Function: ${toolCall.function.name}`);
console.log(`Arguments: ${toolCall.function.arguments}`);

JSON Mode

PythonJavaScript

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Return data as JSON."},
        {"role": "user", "content": "List 3 programming languages with their year of creation."}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

javascript

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Return data as JSON.' },
    { role: 'user', content: 'List 3 programming languages with their year of creation.' },
  ],
  response_format: { type: 'json_object' },
});

const data = JSON.parse(response.choices[0].message.content);

Notes

Model routing: SandBase automatically routes to the best available provider for the requested model. If a provider is unavailable, requests are retried on fallback providers.
Reasoning models: Models like o3 and deepseek-reasoner may include reasoning_content in the response delta during streaming.
Rate limits: See Rate Limiting for details on handling 429 responses.
Billing: Token usage is metered per request. See Billing for pricing details.

LLM Gateway — Chat Completions ​

Endpoint ​

Authentication ​

Request Body ​

Messages Array ​

Content Parts (Multimodal) ​

Tools Array ​

Tool Choice ​

Response Format ​

Response (Non-Streaming) ​

Response Fields ​

Finish Reasons ​

Response (Streaming) ​

Stream Chunk Fields ​

Stream Usage ​

Supported Models ​

Code Examples ​

Basic Chat Completion ​

Streaming ​

Function Calling ​

JSON Mode ​

Notes ​

LLM Gateway — Chat Completions

Endpoint

Authentication

Request Body

Messages Array

Content Parts (Multimodal)

Tools Array

Tool Choice

Response Format

Response (Non-Streaming)

Response Fields

Finish Reasons

Response (Streaming)

Stream Chunk Fields

Stream Usage

Supported Models

Code Examples

Basic Chat Completion

Streaming

Function Calling

JSON Mode

Notes