Documentation

Quick Start Guide

Learn how to integrate LlamaGate into your application. Our API is fully compatible with the OpenAI SDK.

Get started in 60 seconds

Three simple steps to your first API call

Step 1

Install SDK

pip install openai

Step 2

Get API Key

Create key in dashboard

Step 3

Make Request

Call chat.completions.create()

Getting Started

LlamaGate provides an OpenAI-compatible API for accessing open-source language models. You can use the official OpenAI SDK or any HTTP client to make requests.

Tip

You can test the API instantly using our interactive demo on the homepage.

Base URL

Install the SDK

Install the OpenAI SDK for your preferred language:

Authentication

All API requests require authentication using a Bearer token. You can create API keys from your dashboard.

cURL

Warning

Never expose your API key in client-side code. Use environment variables or a backend proxy.

API keys start with llg_sk_ prefix. Create your key in the API Keys dashboard.

Chat Completions

The chat completions endpoint is the primary way to interact with language models. Send a list of messages and receive a model-generated response.

Try this example to see a chat completion response

Python
Response
Click "Try it" to see the response

Parameters

ParameterTypeDescription
modelstringModel ID to use (required)
messagesarrayList of messages in the conversation (required)
temperaturenumberSampling temperature (0-2, default: 1)
max_tokensintegerMaximum tokens to generate
streambooleanEnable streaming responses
top_pnumberNucleus sampling parameter (0-1)

Streaming Responses

For a better user experience, you can stream responses token by token. This is especially useful for chat interfaces.

Tool Calling (Function Calling)

Tool calling allows the model to request external function calls. This is useful for building agents, retrieving real-time data, or performing actions.

Note

Not all models support tool calling. Check the "Tools" badge on the pricing page for supported models.

Defining Tools

Handling Tool Calls

JSON Mode & Structured Outputs

JSON mode ensures the model outputs valid JSON. For even more control, use structured outputs to define an exact JSON schema the response must follow.

Basic JSON Mode

Use {"type": "json_object"} to ensure valid JSON output:

Structured Outputs (JSON Schema)

For guaranteed response structure, provide a JSON schema. The model will strictly follow the schema, ensuring type safety and required fields.

Tip

Use strict: true for guaranteed schema compliance in production.

Vision (Image Input)

Vision-capable models can analyze images. Pass images as base64-encoded data or URLs.

Note

Models with the "Vision" badge on the pricing page support image input.

Embeddings

Generate vector embeddings for text. Useful for semantic search, clustering, and RAG applications.

Batch Embeddings

Available Models

List all available models via the API or view them on the pricing page.

Model Categories

CategoryExamplesBest For
General PurposeLlama 3.1, Qwen, MistralEveryday tasks, chat, writing
CodeCodeGemma, DeepSeek CoderProgramming, code review
ReasoningDeepSeek R1, OpenThinkerComplex problem-solving
VisionLLaVA, Qwen VLImage understanding
EmbeddingsNomic, Qwen EmbeddingVector search, RAG

Error Handling

The API returns standard HTTP status codes and JSON error responses.

Error Codes

StatusDescription
400Bad request - check your parameters
401Unauthorized - invalid or missing API key
402Payment required - insufficient credits
404Not found - model does not exist
429Rate limit exceeded
500Internal server error

Rate Limits

Rate limits ensure fair usage and service stability. Limits are applied per API key.

Limit TypeValue
Requests per minute60 RPM
Tokens per minute100,000 TPM
Concurrent requests10

Rate limit headers are included in API responses:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed
X-RateLimit-RemainingRemaining requests
X-RateLimit-ResetTime when limit resets

Ready to Get Started?

Create an account and start building with just $5.

On this page