Chat API
POST
https://api.rana.ai/v1/api/chat/completions
The Chat API allows you to analyze text content, create new text content, and have dynamic conversations with a variety of open-source LLMs.
Request Body
Required Parameters
Parameter | Type | Description |
---|---|---|
messages |
Array | An array of messages comprising the conversation history. Each message is an object with role and content Supported roles are: system , user , and assistant . |
model |
String | Identifier of the model to be used to chat with. (e.g., DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC ). |
Optional Parameters
Parameter | Type | Default | Description |
---|---|---|---|
frequency_penalty |
Number | 0 | Adjust the penalty for frequent tokens to a value between -2.0 and 2.0. |
logit_bias |
Object | null | Adjust the likelihood of each token being selected by mapping token IDs to bias values between -100 and 100. |
max_completion_tokens |
Integer | null | Set a maximum limit on the total number of tokens that can be generated for a completion, including visible and reasoning tokens. |
n |
Integer | 1 | Number of chat completion choices to generate. |
presence_penalty |
Number | 0 | Set the penalty for repeated tokens to a value between -2.0 and 2.0.0. |
stream |
Boolean | false | Enable or disable real-time streaming of the partial message deltas’ generation process. |
stop |
String | Array | null | The API will stop generating tokens after a (maximum of 4) sequence(s) have been completed. |
temperature |
Number | 1 | Adjust the randomness level in the output to a value between 0 and 2, where lower values result in more focused output. |
tools |
Array | null | A comprehensive list of functions that the model may call during its execution. |
tool_choice |
String | Object | none |
Controls tool selection behavior. Options: none , auto , required . |
Available Model IDs
DeepSeek Models
Model Name | Description |
---|---|
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC |
DeepSeek R1 Distilled Qwen 7B Model (4-bit quantized, 16-bit float) |
DeepSeek-R1-Distill-Qwen-7B-q4f32_1-MLC |
DeepSeek R1 Distilled Qwen 7B Model (4-bit quantized, 32-bit float) |
DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC |
DeepSeek R1 Distilled Llama 8B Model (4-bit quantized, 32-bit float) |
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC |
DeepSeek R1 Distilled Llama 8B Model (4-bit quantized, 16-bit float) |
Llama Models
Model Name | Description |
---|---|
Llama-3.2-1B-Instruct-q4f32_1-MLC |
Llama 3.2 1B Instruct Model (4-bit quantized, 32-bit float) |
Llama-3.2-1B-Instruct-q4f16_1-MLC |
Llama 3.2 1B Instruct Model (4-bit quantized, 16-bit float) |
Llama-3.2-1B-Instruct-q0f32-MLC |
Llama 3.2 1B Instruct Model (32-bit float) |
Llama-3.2-1B-Instruct-q0f16-MLC |
Llama 3.2 1B Instruct Model (16-bit float) |
Llama-3.2-3B-Instruct-q4f32_1-MLC |
Llama 3.2 3B Instruct Model (4-bit quantized, 32-bit float) |
Llama-3.2-3B-Instruct-q4f16_1-MLC |
Llama 3.2 3B Instruct Model (4-bit quantized, 16-bit float) |
Llama-3.1-8B-Instruct-q4f32_1-MLC-1k |
Llama 3.1 8B Instruct Model (4-bit quantized, 32-bit float, 1k context) |
Llama-3.1-8B-Instruct-q4f16_1-MLC-1k |
Llama 3.1 8B Instruct Model (4-bit quantized, 16-bit float, 1k context) |
Llama-3.1-8B-Instruct-q4f32_1-MLC |
Llama 3.1 8B Instruct Model (4-bit quantized, 32-bit float) |
Llama-3.1-8B-Instruct-q4f16_1-MLC |
Llama 3.1 8B Instruct Model (4-bit quantized, 16-bit float) |
Llama-3-8B-Instruct-q4f32_1-MLC-1k |
Llama 3 8B Instruct Model (4-bit quantized, 32-bit float, 1k context) |
Llama-3-8B-Instruct-q4f16_1-MLC-1k |
Llama 3 8B Instruct Model (4-bit quantized, 16-bit float, 1k context) |
Llama-3-8B-Instruct-q4f32_1-MLC |
Llama 3 8B Instruct Model (4-bit quantized, 32-bit float) |
Llama-3-8B-Instruct-q4f16_1-MLC |
Llama 3 8B Instruct Model (4-bit quantized, 16-bit float) |
Llama-3-70B-Instruct-q3f16_1-MLC |
Llama 3 70B Instruct Model (3-bit quantized, 16-bit float) |
Llama-3.1-70B-Instruct-q3f16_1-MLC |
Llama 3.1 70B Instruct Model (3-bit quantized, 16-bit float) |
Llama-2-7b-chat-hf-q4f32_1-MLC-1k |
Llama 2 7B Chat Model (4-bit quantized, 32-bit float, 1k context) |
Llama-2-7b-chat-hf-q4f16_1-MLC-1k |
Llama 2 7B Chat Model (4-bit quantized, 16-bit float, 1k context) |
Llama-2-7b-chat-hf-q4f32_1-MLC |
Llama 2 7B Chat Model (4-bit quantized, 32-bit float) |
Llama-2-7b-chat-hf-q4f16_1-MLC |
Llama 2 7B Chat Model (4-bit quantized, 16-bit float) |
Llama-2-13b-chat-hf-q4f16_1-MLC |
Llama 2 13B Chat Model (4-bit quantized, 16-bit float) |
Mistral & Hermes Models
Model Name | Description |
---|---|
Mistral-7B-Instruct-v0.3-q4f16_1-MLC |
Mistral 7B Instruct v0.3 (4-bit quantized, 16-bit float) |
Mistral-7B-Instruct-v0.3-q4f32_1-MLC |
Mistral 7B Instruct v0.3 (4-bit quantized, 32-bit float) |
Mistral-7B-Instruct-v0.2-q4f16_1-MLC |
Mistral 7B Instruct v0.2 (4-bit quantized, 16-bit float) |
Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC |
Hermes Pro Llama 3 8B Model (4-bit quantized, 16-bit float) |
Hermes-2-Pro-Llama-3-8B-q4f32_1-MLC |
Hermes Pro Llama 3 8B Model (4-bit quantized, 32-bit float) |
Hermes-2-Pro-Mistral-7B-q4f16_1-MLC |
Hermes Pro Mistral 7B Model (4-bit quantized, 16-bit float) |
OpenHermes-2.5-Mistral-7B-q4f16_1-MLC |
OpenHermes Mistral 7B Model (4-bit quantized, 16-bit float) |
NeuralHermes-2.5-Mistral-7B-q4f16_1-MLC |
NeuralHermes Mistral 7B Model (4-bit quantized, 16-bit float) |
Phi Models
Model Name | Description |
---|---|
Phi-3-mini-4k-instruct-q4f16_1-MLC |
Phi-3 Mini 4K Instruct Model (4-bit quantized, 16-bit float) |
Phi-3-mini-4k-instruct-q4f32_1-MLC |
Phi-3 Mini 4K Instruct Model (4-bit quantized, 32-bit float) |
Phi-3-mini-4k-instruct-q4f16_1-MLC-1k |
Phi-3 Mini 4K Instruct Model (4-bit quantized, 16-bit float, 1k context) |
Phi-3-mini-4k-instruct-q4f32_1-MLC-1k |
Phi-3 Mini 4K Instruct Model (4-bit quantized, 32-bit float, 1k context) |
phi-2-q4f16_1-MLC |
Phi-2 Model (4-bit quantized, 16-bit float) |
phi-2-q4f32_1-MLC |
Phi-2 Model (4-bit quantized, 32-bit float) |
phi-2-q4f16_1-MLC-1k |
Phi-2 Model (4-bit quantized, 16-bit float, 1k context) |
phi-2-q4f32_1-MLC-1k |
Phi-2 Model (4-bit quantized, 32-bit float, 1k context) |
phi-1_5-q4f16_1-MLC |
Phi-1.5 Model (4-bit quantized, 16-bit float) |
phi-1_5-q4f32_1-MLC |
Phi-1.5 Model (4-bit quantized, 32-bit float) |
phi-1_5-q4f16_1-MLC-1k |
Phi-1.5 Model (4-bit quantized, 16-bit float, 1k context) |
phi-1_5-q4f32_1-MLC-1k |
Phi-1.5 Model (4-bit quantized, 32-bit float, 1k context) |
RedPajama Models
Model Name | Description |
---|---|
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC |
RedPajama Chat 3B Model (4-bit quantized, 16-bit float) |
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC |
RedPajama Chat 3B Model (4-bit quantized, 32-bit float) |
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC-1k |
RedPajama Chat 3B Model (4-bit quantized, 16-bit float, 1k context) |
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k |
RedPajama Chat 3B Model (4-bit quantized, 32-bit float, 1k context) |
SmolLM Models
Model Name | Description |
---|---|
SmolLM2-1.7B-Instruct-q4f16_1-MLC |
SmolLM2 1.7B Instruct Model (4-bit quantized, 16-bit float) |
SmolLM2-1.7B-Instruct-q4f32_1-MLC |
SmolLM2 1.7B Instruct Model (4-bit quantized, 32-bit float) |
SmolLM2-360M-Instruct-q0f16-MLC |
SmolLM2 360M Instruct Model (16-bit float) |
SmolLM2-360M-Instruct-q0f32-MLC |
SmolLM2 360M Instruct Model (32-bit float) |
SmolLM2-360M-Instruct-q4f16_1-MLC |
SmolLM2 360M Instruct Model (4-bit quantized, 16-bit float) |
SmolLM2-360M-Instruct-q4f32_1-MLC |
SmolLM2 360M Instruct Model (4-bit quantized, 32-bit float) |
SmolLM2-135M-Instruct-q0f16-MLC |
SmolLM2 135M Instruct Model (16-bit float) |
SmolLM2-135M-Instruct-q0f32-MLC |
SmolLM2 135M Instruct Model (32-bit float) |
TinyLlama Models
Model Name | Description |
---|---|
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC |
TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 16-bit float) |
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC |
TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 32-bit float) |
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC-1k |
TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 16-bit float, 1k context) |
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC-1k |
TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 32-bit float, 1k context) |
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC |
TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 16-bit float) |
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC |
TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 32-bit float) |
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k |
TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 16-bit float, 1k context) |
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC-1k |
TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 32-bit float, 1k context) |
Qwen2 Models
Model Name | Description |
---|---|
Qwen2-0.5B-Instruct-q4f16_1-MLC |
Qwen2 0.5B Instruct Model (4-bit quantized, 16-bit float) |
Qwen2-0.5B-Instruct-q0f16-MLC |
Qwen2 0.5B Instruct Model (16-bit float) |
Qwen2-0.5B-Instruct-q0f32-MLC |
Qwen2 0.5B Instruct Model (32-bit float) |
Qwen2-1.5B-Instruct-q4f16_1-MLC |
Qwen2 1.5B Instruct Model (4-bit quantized, 16-bit float) |
Qwen2-1.5B-Instruct-q4f32_1-MLC |
Qwen2 1.5B Instruct Model (4-bit quantized, 32-bit float) |
Qwen2-7B-Instruct-q4f16_1-MLC |
Qwen2 7B Instruct Model (4-bit quantized, 16-bit float) |
Qwen2-7B-Instruct-q4f32_1-MLC |
Qwen2 7B Instruct Model (4-bit quantized, 32-bit float) |
Other Models
Model Name | Description |
---|---|
gemma-2b-it-q4f16_1-MLC |
Gemma 2B Instruct Model (4-bit quantized, 16-bit float) |
gemma-2b-it-q4f32_1-MLC |
Gemma 2B Instruct Model (4-bit quantized, 32-bit float) |
gemma-2b-it-q4f16_1-MLC-1k |
Gemma 2B Instruct Model (4-bit quantized, 16-bit float, 1k context) |
gemma-2b-it-q4f32_1-MLC-1k |
Gemma 2B Instruct Model (4-bit quantized, 32-bit float, 1k context) |
stablelm-2-zephyr-1_6b-q4f16_1-MLC |
StableLM 2 Zephyr 1.6B Model (4-bit quantized, 16-bit float) |
stablelm-2-zephyr-1_6b-q4f32_1-MLC |
StableLM 2 Zephyr 1.6B Model (4-bit quantized, 32-bit float) |
stablelm-2-zephyr-1_6b-q4f16_1-MLC-1k |
StableLM 2 Zephyr 1.6B Model (4-bit quantized, 16-bit float, 1k context) |
stablelm-2-zephyr-1_6b-q4f32_1-MLC-1k |
StableLM 2 Zephyr 1.6B Model (4-bit quantized, 32-bit float, 1k context) |
WizardMath-7B-V1.1-q4f16_1-MLC |
WizardMath 7B Model (4-bit quantized, 16-bit float) |
Response Format (Non-Streaming)
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "Llama-3.2-1B-Instruct-q4f32_1-MLC",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Response Format (Streaming)
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"Llama-3.2-1B-Instruct-q4f32_1-MLC", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"Llama-3.2-1B-Instruct-q4f32_1-MLC", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}
....
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"Llama-3.2-1B-Instruct-q4f32_1-MLC", "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
Examples:
Local Non-Streaming Request (cURL)
curl --http1.1 -N http://localhost:6969/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"stream": true,
"model": "Llama-3.2-3B-Instruct-q4f16_1-MLC",
"messages": [
{
"role": "assistant",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Can you tell me the meaning of life?"
}
]
}'
Local Streaming Request (cURL)
curl --http1.1 -N http://localhost:6969/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"stream": true,
"model": "Llama-3.2-3B-Instruct-q4f16_1-MLC",
"messages": [
{
"role": "assistant",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Can you tell me the meaning of life?"
}
]
}'
Remote Request (cURL)
curl --http1.1 -N https://api.rana.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"stream": true,
"model": "Llama-3.2-3B-Instruct-q4f16_1-MLC",
"messages": [
{
"role": "assistant",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Can you tell me the meaning of life?"
}
]
}'