Chat API

POST https://api.rana.ai/v1/api/chat/completions

The Chat API allows you to analyze text content, create new text content, and have dynamic conversations with a variety of open-source LLMs.

Request Body


Required Parameters

Parameter Type Description
messages Array An array of messages comprising the conversation history. Each message is an object with role and content Supported roles are: system, user, and assistant.
model String Identifier of the model to be used to chat with. (e.g., DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC).

Optional Parameters

Parameter Type Default Description
frequency_penalty Number 0 Adjust the penalty for frequent tokens to a value between -2.0 and 2.0.
logit_bias Object null Adjust the likelihood of each token being selected by mapping token IDs to bias values between -100 and 100.
max_completion_tokens Integer null Set a maximum limit on the total number of tokens that can be generated for a completion, including visible and reasoning tokens.
n Integer 1 Number of chat completion choices to generate.
presence_penalty Number 0 Set the penalty for repeated tokens to a value between -2.0 and 2.0.0.
stream Boolean false Enable or disable real-time streaming of the partial message deltas’ generation process.
stop String | Array null The API will stop generating tokens after a (maximum of 4) sequence(s) have been completed.
temperature Number 1 Adjust the randomness level in the output to a value between 0 and 2, where lower values result in more focused output.
tools Array null A comprehensive list of functions that the model may call during its execution.
tool_choice String | Object none Controls tool selection behavior. Options: none, auto, required.

Available Model IDs


DeepSeek Models
Model Name Description
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC DeepSeek R1 Distilled Qwen 7B Model (4-bit quantized, 16-bit float)
DeepSeek-R1-Distill-Qwen-7B-q4f32_1-MLC DeepSeek R1 Distilled Qwen 7B Model (4-bit quantized, 32-bit float)
DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC DeepSeek R1 Distilled Llama 8B Model (4-bit quantized, 32-bit float)
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC DeepSeek R1 Distilled Llama 8B Model (4-bit quantized, 16-bit float)

Llama Models
Model Name Description
Llama-3.2-1B-Instruct-q4f32_1-MLC Llama 3.2 1B Instruct Model (4-bit quantized, 32-bit float)
Llama-3.2-1B-Instruct-q4f16_1-MLC Llama 3.2 1B Instruct Model (4-bit quantized, 16-bit float)
Llama-3.2-1B-Instruct-q0f32-MLC Llama 3.2 1B Instruct Model (32-bit float)
Llama-3.2-1B-Instruct-q0f16-MLC Llama 3.2 1B Instruct Model (16-bit float)
Llama-3.2-3B-Instruct-q4f32_1-MLC Llama 3.2 3B Instruct Model (4-bit quantized, 32-bit float)
Llama-3.2-3B-Instruct-q4f16_1-MLC Llama 3.2 3B Instruct Model (4-bit quantized, 16-bit float)
Llama-3.1-8B-Instruct-q4f32_1-MLC-1k Llama 3.1 8B Instruct Model (4-bit quantized, 32-bit float, 1k context)
Llama-3.1-8B-Instruct-q4f16_1-MLC-1k Llama 3.1 8B Instruct Model (4-bit quantized, 16-bit float, 1k context)
Llama-3.1-8B-Instruct-q4f32_1-MLC Llama 3.1 8B Instruct Model (4-bit quantized, 32-bit float)
Llama-3.1-8B-Instruct-q4f16_1-MLC Llama 3.1 8B Instruct Model (4-bit quantized, 16-bit float)
Llama-3-8B-Instruct-q4f32_1-MLC-1k Llama 3 8B Instruct Model (4-bit quantized, 32-bit float, 1k context)
Llama-3-8B-Instruct-q4f16_1-MLC-1k Llama 3 8B Instruct Model (4-bit quantized, 16-bit float, 1k context)
Llama-3-8B-Instruct-q4f32_1-MLC Llama 3 8B Instruct Model (4-bit quantized, 32-bit float)
Llama-3-8B-Instruct-q4f16_1-MLC Llama 3 8B Instruct Model (4-bit quantized, 16-bit float)
Llama-3-70B-Instruct-q3f16_1-MLC Llama 3 70B Instruct Model (3-bit quantized, 16-bit float)
Llama-3.1-70B-Instruct-q3f16_1-MLC Llama 3.1 70B Instruct Model (3-bit quantized, 16-bit float)
Llama-2-7b-chat-hf-q4f32_1-MLC-1k Llama 2 7B Chat Model (4-bit quantized, 32-bit float, 1k context)
Llama-2-7b-chat-hf-q4f16_1-MLC-1k Llama 2 7B Chat Model (4-bit quantized, 16-bit float, 1k context)
Llama-2-7b-chat-hf-q4f32_1-MLC Llama 2 7B Chat Model (4-bit quantized, 32-bit float)
Llama-2-7b-chat-hf-q4f16_1-MLC Llama 2 7B Chat Model (4-bit quantized, 16-bit float)
Llama-2-13b-chat-hf-q4f16_1-MLC Llama 2 13B Chat Model (4-bit quantized, 16-bit float)

Mistral & Hermes Models
Model Name Description
Mistral-7B-Instruct-v0.3-q4f16_1-MLC Mistral 7B Instruct v0.3 (4-bit quantized, 16-bit float)
Mistral-7B-Instruct-v0.3-q4f32_1-MLC Mistral 7B Instruct v0.3 (4-bit quantized, 32-bit float)
Mistral-7B-Instruct-v0.2-q4f16_1-MLC Mistral 7B Instruct v0.2 (4-bit quantized, 16-bit float)
Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC Hermes Pro Llama 3 8B Model (4-bit quantized, 16-bit float)
Hermes-2-Pro-Llama-3-8B-q4f32_1-MLC Hermes Pro Llama 3 8B Model (4-bit quantized, 32-bit float)
Hermes-2-Pro-Mistral-7B-q4f16_1-MLC Hermes Pro Mistral 7B Model (4-bit quantized, 16-bit float)
OpenHermes-2.5-Mistral-7B-q4f16_1-MLC OpenHermes Mistral 7B Model (4-bit quantized, 16-bit float)
NeuralHermes-2.5-Mistral-7B-q4f16_1-MLC NeuralHermes Mistral 7B Model (4-bit quantized, 16-bit float)

Phi Models
Model Name Description
Phi-3-mini-4k-instruct-q4f16_1-MLC Phi-3 Mini 4K Instruct Model (4-bit quantized, 16-bit float)
Phi-3-mini-4k-instruct-q4f32_1-MLC Phi-3 Mini 4K Instruct Model (4-bit quantized, 32-bit float)
Phi-3-mini-4k-instruct-q4f16_1-MLC-1k Phi-3 Mini 4K Instruct Model (4-bit quantized, 16-bit float, 1k context)
Phi-3-mini-4k-instruct-q4f32_1-MLC-1k Phi-3 Mini 4K Instruct Model (4-bit quantized, 32-bit float, 1k context)
phi-2-q4f16_1-MLC Phi-2 Model (4-bit quantized, 16-bit float)
phi-2-q4f32_1-MLC Phi-2 Model (4-bit quantized, 32-bit float)
phi-2-q4f16_1-MLC-1k Phi-2 Model (4-bit quantized, 16-bit float, 1k context)
phi-2-q4f32_1-MLC-1k Phi-2 Model (4-bit quantized, 32-bit float, 1k context)
phi-1_5-q4f16_1-MLC Phi-1.5 Model (4-bit quantized, 16-bit float)
phi-1_5-q4f32_1-MLC Phi-1.5 Model (4-bit quantized, 32-bit float)
phi-1_5-q4f16_1-MLC-1k Phi-1.5 Model (4-bit quantized, 16-bit float, 1k context)
phi-1_5-q4f32_1-MLC-1k Phi-1.5 Model (4-bit quantized, 32-bit float, 1k context)

RedPajama Models
Model Name Description
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC RedPajama Chat 3B Model (4-bit quantized, 16-bit float)
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC RedPajama Chat 3B Model (4-bit quantized, 32-bit float)
RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC-1k RedPajama Chat 3B Model (4-bit quantized, 16-bit float, 1k context)
RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k RedPajama Chat 3B Model (4-bit quantized, 32-bit float, 1k context)

SmolLM Models
Model Name Description
SmolLM2-1.7B-Instruct-q4f16_1-MLC SmolLM2 1.7B Instruct Model (4-bit quantized, 16-bit float)
SmolLM2-1.7B-Instruct-q4f32_1-MLC SmolLM2 1.7B Instruct Model (4-bit quantized, 32-bit float)
SmolLM2-360M-Instruct-q0f16-MLC SmolLM2 360M Instruct Model (16-bit float)
SmolLM2-360M-Instruct-q0f32-MLC SmolLM2 360M Instruct Model (32-bit float)
SmolLM2-360M-Instruct-q4f16_1-MLC SmolLM2 360M Instruct Model (4-bit quantized, 16-bit float)
SmolLM2-360M-Instruct-q4f32_1-MLC SmolLM2 360M Instruct Model (4-bit quantized, 32-bit float)
SmolLM2-135M-Instruct-q0f16-MLC SmolLM2 135M Instruct Model (16-bit float)
SmolLM2-135M-Instruct-q0f32-MLC SmolLM2 135M Instruct Model (32-bit float)

TinyLlama Models
Model Name Description
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 16-bit float)
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 32-bit float)
TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC-1k TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 16-bit float, 1k context)
TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC-1k TinyLlama Chat 1.1B v1.0 Model (4-bit quantized, 32-bit float, 1k context)
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 16-bit float)
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 32-bit float)
TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 16-bit float, 1k context)
TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC-1k TinyLlama Chat 1.1B v0.4 Model (4-bit quantized, 32-bit float, 1k context)

Qwen2 Models
Model Name Description
Qwen2-0.5B-Instruct-q4f16_1-MLC Qwen2 0.5B Instruct Model (4-bit quantized, 16-bit float)
Qwen2-0.5B-Instruct-q0f16-MLC Qwen2 0.5B Instruct Model (16-bit float)
Qwen2-0.5B-Instruct-q0f32-MLC Qwen2 0.5B Instruct Model (32-bit float)
Qwen2-1.5B-Instruct-q4f16_1-MLC Qwen2 1.5B Instruct Model (4-bit quantized, 16-bit float)
Qwen2-1.5B-Instruct-q4f32_1-MLC Qwen2 1.5B Instruct Model (4-bit quantized, 32-bit float)
Qwen2-7B-Instruct-q4f16_1-MLC Qwen2 7B Instruct Model (4-bit quantized, 16-bit float)
Qwen2-7B-Instruct-q4f32_1-MLC Qwen2 7B Instruct Model (4-bit quantized, 32-bit float)

Other Models
Model Name Description
gemma-2b-it-q4f16_1-MLC Gemma 2B Instruct Model (4-bit quantized, 16-bit float)
gemma-2b-it-q4f32_1-MLC Gemma 2B Instruct Model (4-bit quantized, 32-bit float)
gemma-2b-it-q4f16_1-MLC-1k Gemma 2B Instruct Model (4-bit quantized, 16-bit float, 1k context)
gemma-2b-it-q4f32_1-MLC-1k Gemma 2B Instruct Model (4-bit quantized, 32-bit float, 1k context)
stablelm-2-zephyr-1_6b-q4f16_1-MLC StableLM 2 Zephyr 1.6B Model (4-bit quantized, 16-bit float)
stablelm-2-zephyr-1_6b-q4f32_1-MLC StableLM 2 Zephyr 1.6B Model (4-bit quantized, 32-bit float)
stablelm-2-zephyr-1_6b-q4f16_1-MLC-1k StableLM 2 Zephyr 1.6B Model (4-bit quantized, 16-bit float, 1k context)
stablelm-2-zephyr-1_6b-q4f32_1-MLC-1k StableLM 2 Zephyr 1.6B Model (4-bit quantized, 32-bit float, 1k context)
WizardMath-7B-V1.1-q4f16_1-MLC WizardMath 7B Model (4-bit quantized, 16-bit float)

Response Format (Non-Streaming)

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "Llama-3.2-1B-Instruct-q4f32_1-MLC",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Response Format (Streaming)

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"Llama-3.2-1B-Instruct-q4f32_1-MLC", "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"Llama-3.2-1B-Instruct-q4f32_1-MLC", "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

....

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"Llama-3.2-1B-Instruct-q4f32_1-MLC", "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

Examples:

Local Non-Streaming Request (cURL)

curl --http1.1 -N http://localhost:6969/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "stream": true,
    "model": "Llama-3.2-3B-Instruct-q4f16_1-MLC",
    "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Can you tell me the meaning of life?"
      }
    ]
  }'

Local Streaming Request (cURL)

curl --http1.1 -N http://localhost:6969/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "stream": true,
    "model": "Llama-3.2-3B-Instruct-q4f16_1-MLC",
    "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Can you tell me the meaning of life?"
      }
    ]
  }'

Remote Request (cURL)

curl --http1.1 -N https://api.rana.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "stream": true,
    "model": "Llama-3.2-3B-Instruct-q4f16_1-MLC",
    "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Can you tell me the meaning of life?"
      }
    ]
  }'