Quick Start - LlamaGate Documentation

Get started in 60 seconds

Three simple steps to your first API call

Step 1

Install SDK

pip install openai

Step 2

Get API Key

Create key in dashboard

Step 3

Make Request

Call chat.completions.create()

Getting Started

LlamaGate provides an OpenAI-compatible API for accessing open-source language models. You can use the official OpenAI SDK or any HTTP client to make requests.

Tip

You can test the API instantly using our interactive demo on the homepage.

Base URL

Install the SDK

Install the OpenAI SDK for your preferred language:

Authentication

All API requests require authentication using a Bearer token. You can create API keys from your dashboard.

cURL

Warning

Never expose your API key in client-side code. Use environment variables or a backend proxy.

API keys start with llg_sk_ prefix. Create your key in the API Keys dashboard.

Chat Completions

The chat completions endpoint is the primary way to interact with language models. Send a list of messages and receive a model-generated response.

Try this example to see a chat completion response

Python

Response

Click "Try it" to see the response

Parameters

Parameter	Type	Description
`model`	string	Model ID to use (required)
`messages`	array	List of messages in the conversation (required)
`temperature`	number	Sampling temperature (0-2, default: 1)
`max_tokens`	integer	Maximum tokens to generate
`stream`	boolean	Enable streaming responses
`top_p`	number	Nucleus sampling parameter (0-1)

Streaming Responses

For a better user experience, you can stream responses token by token. This is especially useful for chat interfaces.

Tool Calling (Function Calling)

Tool calling allows the model to request external function calls. This is useful for building agents, retrieving real-time data, or performing actions.

Note

Not all models support tool calling. Check the "Tools" badge on the pricing page for supported models.

Defining Tools

Handling Tool Calls

JSON Mode & Structured Outputs

JSON mode ensures the model outputs valid JSON. For even more control, use structured outputs to define an exact JSON schema the response must follow.

Basic JSON Mode

Use {"type": "json_object"} to ensure valid JSON output:

Structured Outputs (JSON Schema)

For guaranteed response structure, provide a JSON schema. The model will strictly follow the schema, ensuring type safety and required fields.

Tip

Use strict: true for guaranteed schema compliance in production.

Vision (Image Input)

Vision-capable models can analyze images. Pass images as base64-encoded data or URLs.

Note

Models with the "Vision" badge on the pricing page support image input.

Embeddings

Generate vector embeddings for text. Useful for semantic search, clustering, and RAG applications.

Batch Embeddings

Available Models

List all available models via the API or view them on the pricing page.

Model Categories

Category	Examples	Best For
General Purpose	Llama 3.1, Qwen, Mistral	Everyday tasks, chat, writing
Code	CodeGemma, DeepSeek Coder	Programming, code review
Reasoning	DeepSeek R1, OpenThinker	Complex problem-solving
Vision	LLaVA, Qwen VL	Image understanding
Embeddings	Nomic, Qwen Embedding	Vector search, RAG

Error Handling

The API returns standard HTTP status codes and JSON error responses.

Error Codes

Status	Description
`400`	Bad request - check your parameters
`401`	Unauthorized - invalid or missing API key
`402`	Payment required - insufficient credits
`404`	Not found - model does not exist
`429`	Rate limit exceeded
`500`	Internal server error

Rate Limits

Rate limits ensure fair usage and service stability. Limits are applied per API key.

Limit Type	Value
Requests per minute	60 RPM
Tokens per minute	100,000 TPM
Concurrent requests	10

Rate limit headers are included in API responses:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed
`X-RateLimit-Remaining`	Remaining requests
`X-RateLimit-Reset`	Time when limit resets

Ready to Get Started?

Create an account and start building with just $5.

Quick Start Guide