Skip to content

Billing & Pricing

SandBase uses a pay-as-you-go credit system. You purchase credits, and they're consumed based on the models you use and the tokens processed.

Credit System

How Credits Work

1 credit = $1 USD. Credits are consumed per-token based on the model's pricing:

Cost per request = (input_tokens × input_price) + (output_tokens × output_price)

For models with special token types:

Total cost = (prompt_tokens × input_price)
           + (cached_tokens × input_price × cache_read_multiplier)
           + (cache_creation_tokens × input_price × cache_write_multiplier)
           + (completion_tokens × output_price)
           + (reasoning_tokens × reasoning_price)

Top-Up Flow

  1. Navigate to Dashboard → Billing
  2. Click Add Credits
  3. Choose an amount ($5, $20, $50, $100, or custom)
  4. Complete payment via Stripe (credit card, Apple Pay, Google Pay)
  5. Credits are available immediately

TIP

Credits never expire. There's no monthly subscription — you only pay for what you use.

Checking Your Balance

python
import requests

response = requests.get(
    "https://api.sandbase.ai/v1/account/balance",
    headers={"Authorization": "Bearer sk-sb-your-key"}
)
print(response.json())
# {"balance": 42.50, "currency": "USD"}
bash
curl https://api.sandbase.ai/v1/account/balance \
  -H "Authorization: Bearer sk-sb-your-key"

Per-Model Pricing

Pricing is per 1 million tokens (1M tokens). All prices are in USD.

OpenAI Models

ModelInput (per 1M)Output (per 1M)Notes
gpt-4o$2.50$10.00Best value for complex tasks
gpt-4o-mini$0.15$0.60Great for simple tasks
o3$10.00$40.00Reasoning tokens billed at output rate
o3-mini$1.10$4.40Budget reasoning model

Anthropic Models

ModelInput (per 1M)Output (per 1M)Notes
claude-sonnet-4$3.00$15.00Best for coding and analysis
claude-3.5-haiku$0.80$4.00Fast and affordable

Google Models

ModelInput (per 1M)Output (per 1M)Notes
gemini-2.5-pro$1.25$10.001M context window
gemini-2.5-flash$0.15$0.60Fastest response times

DeepSeek Models

ModelInput (per 1M)Output (per 1M)Notes
deepseek-v3$0.27$1.10Strong general-purpose
deepseek-chat$0.27$1.10Optimized for conversation
deepseek-reasoner$0.55$2.19Chain-of-thought reasoning

Open-Source Models

ModelInput (per 1M)Output (per 1M)Notes
llama-4-maverick$0.20$0.60Meta's latest open model
qwen3-32b$0.10$0.30Alibaba's efficient model

Other Models

ModelInput (per 1M)Output (per 1M)Notes
seed-1.6$0.50$2.00ByteDance multimodal

INFO

Prices are subject to change. Check the Models page for the latest pricing.

Caching Discounts

Models that support prompt caching offer significant savings on repeated prompts:

ProviderCache Read DiscountCache Write Premium
Anthropic90% off (0.1× input price)25% premium (1.25× input price)
OpenAI50% off (0.5× input price)No premium
DeepSeek90% off (0.1× input price)No premium

Example: A cached Anthropic request with 10,000 cached input tokens at $3.00/1M:

  • Without cache: 10,000 × $3.00/1M = $0.03
  • With cache hit: 10,000 × $3.00/1M × 0.1 = $0.003 (90% savings)

Usage Tracking

Dashboard

The Dashboard → Usage page shows:

  • Daily/weekly/monthly token consumption
  • Cost breakdown by model
  • Request count and average tokens per request
  • Trend charts for cost forecasting

API

Query your usage programmatically:

bash
curl "https://api.sandbase.ai/v1/account/usage?start=2024-01-01&end=2024-01-31" \
  -H "Authorization: Bearer sk-sb-your-key"

Response:

json
{
  "total_cost": 12.45,
  "total_tokens": 2450000,
  "by_model": [
    {
      "model": "gpt-4o",
      "input_tokens": 1200000,
      "output_tokens": 300000,
      "cost": 6.00
    },
    {
      "model": "claude-sonnet-4",
      "input_tokens": 500000,
      "output_tokens": 150000,
      "cost": 3.75
    }
  ]
}

Cost Optimization Tips

1. Choose the Right Model for the Task

Not every task needs GPT-4o or Claude Sonnet. Use smaller models for simple operations:

TaskRecommended ModelCost vs GPT-4o
Classificationgpt-4o-mini17x cheaper
Summarizationgemini-2.5-flash17x cheaper
Simple Q&Aclaude-3.5-haiku3x cheaper
Code generationclaude-sonnet-4Similar
Complex reasoningo34x more expensive

2. Use Prompt Caching

For applications with repeated system prompts or context (RAG, agents), enable caching:

python
# Anthropic caching — mark the system prompt as cacheable
response = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[
        {
            "role": "system",
            "content": long_system_prompt,
            # Cache this across requests
        },
        {"role": "user", "content": user_query}
    ],
    extra_body={
        "anthropic": {
            "cache_control": [{"type": "ephemeral"}]
        }
    }
)

3. Minimize Input Tokens

  • Trim unnecessary context from prompts
  • Use concise system prompts
  • Summarize conversation history instead of sending full transcripts
  • Remove redundant examples from few-shot prompts

4. Set Appropriate max_tokens

Don't set max_tokens higher than needed — while you only pay for generated tokens, lower limits prevent runaway generation:

python
# For a yes/no classification, limit output
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Is this spam? 'Buy now!'"}],
    max_tokens=10  # Only need "yes" or "no"
)

5. Batch Similar Requests

Process multiple items in a single request when possible:

python
# Instead of 10 separate requests:
items = ["item1", "item2", "item3", ...]

# Send one request with all items
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": f"Classify each item as positive/negative:\n" + 
                   "\n".join(f"- {item}" for item in items)
    }]
)

6. Monitor and Set Budgets

Set spending alerts in the dashboard to avoid unexpected costs:

  1. Go to Dashboard → Billing → Alerts
  2. Set a daily or monthly spending threshold
  3. Receive email notifications when approaching the limit

Insufficient Credits

When your balance reaches zero, API requests return HTTP 402. On the OpenAI-compatible endpoint the body is:

json
{
  "error": {
    "message": "insufficient balance",
    "type": "invalid_request_error",
    "code": null,
    "param": null
  }
}

Top up your account to resume service. There's no service interruption for in-flight streaming requests — they'll complete normally.