Skip to main content
POST
/
vu
/
chat
/
completions
Chat Completions Gemini
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/vu/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini:gemini-2.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n": 1,
  "stream": false,
  "stop": "<string>",
  "extra_body": {
    "metadata": {
      "thinking_config": {
        "thinking_budget": 123
      },
      "response_mime_type": "application/json",
      "responseSchema": {
        "type": "OBJECT",
        "properties": {},
        "required": [
          "<string>"
        ]
      }
    }
  }
}
'
{
  "id": "resp_5810813e-99b9-427a-8736-23cf34573627",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093284,
  "status": "completed",
  "choices": [
    {
      "text": "This video demonstrates the effects of vacuum on three different states of oranges.\n\n1. **Setup**: Three orange-related items are placed in a vacuum chamber: a whole orange",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 16983,
    "output_tokens": 37,
    "total_tokens": 17020
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

Documentation Index

Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt

Use this file to discover all available pages before exploring further.

This endpoint allows you to generate chat completions with text, video, and image inputs using Gemini VLM model.
Endpoint: POST https://mavi-backend.memories.ai/serve/api/v2/vu/chat/completionsVideo Understanding (VLM) endpoints use the /vu path prefix. Image Understanding (ILM) endpoints use /iu instead.

Supported Models

All models require the gemini: prefix when used in the model parameter (e.g., gemini:gemini-2.5-flash).

Premium Models

ModelInput PriceOutput Price
gemini-3-pro-preview$2/1M (≤200K), $4/1M (>200K)$12/1M (≤200K), $18/1M (>200K)
gemini-2.5-pro$1.25/1M (≤200K), $2.5/1M (>200K)$10/1M (≤200K), $15/1M (>200K)

Flash Models (High Performance)

ModelInput PriceOutput Price
gemini-3-flash-preview$0.5/1M tokens$3/1M tokens
gemini-2.5-flash$0.30/1M tokens$2.5/1M tokens
gemini-2.5-flash-preview-09-2025$0.30/1M tokens$2.5/1M tokens
gemini-2.0-flash$0.1/1M tokens$0.4/1M tokens

Lite Models (Cost-Effective)

ModelInput PriceOutput Price
gemini-2.5-flash-lite$0.1/1M tokens$0.4/1M tokens
gemini-2.5-flash-lite-preview-09-2025$0.1/1M tokens$0.4/1M tokens
gemini-2.0-flash-lite$0.075/1M tokens$0.3/1M tokens
When using these models, remember to include the gemini: prefix in your API calls:
  • ✅ Correct: "model": "gemini:gemini-2.5-flash"
  • ❌ Incorrect: "model": "gemini-2.5-flash"

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYes-The model to use (e.g., gemini:gemini-2.5-flash)
messagesarrayYes-Array of message objects. Each message contains:
- role: Role type, values: system, user, assistant
- content: Message content, can be a string or array. Array items can contain:
- type: Content type, text or input_file
- text: Text content (when type is text)
- file_uri: File URL or base64 encoded file (when type is input_file). Note: video does not support base64
- mime_type: MIME type of the file (e.g., image/jpeg, video/mp4)
temperaturenumberNo0.7Controls randomness: 0.0-2.0, higher = more random
max_tokensintegerNo1000Maximum number of tokens to generate
top_pnumberNo1.0Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
frequency_penaltynumberNo0.0Reduces repetition of frequent tokens: -2.0 to 2.0
presence_penaltynumberNo0.0Increases likelihood of new topics: -2.0 to 2.0
nintegerNo1Number of completions to generate
streambooleanNofalseWhether to stream the response
stopstring | array | nullNonullStop sequences. Can be a string, array of strings, or null
extra_bodyobjectNo-Additional body parameters. Contains:
- metadata: Metadata object
- thinking_config: Thinking configuration
- thinking_budget: Integer value for thinking budget
- response_mime_type: Response MIME type (application/json or json_schema)
- responseSchema: JSON schema object for structured output

Code Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mai-this_a_test_string_please_use_your_generated_key_during_testing",
    base_url="https://mavi-backend.memories.ai/serve/api/v2/vu"
)

def call_my_vlm():
    resp = client.chat.completions.create(
        model="gemini:gemini-2.5-flash",  # e.g. gemini:gemini-3-flash-preview or gemini:gemini-2.5-flash
        messages=[
            {"role": "system", "content": "You are a multimodal assistant. Keep your answers concise."},
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please summarize the content of this video and image"
                    },
                    {
                        "type": "input_file",
                        "file_uri": "https://storage.googleapis.com/memories-test-data/gun5.png",  # base64 or url, video does not support base64
                        "mime_type": "image/jpeg"
                    },
                    {
                        "type": "input_file",
                        "file_uri": "https://storage.googleapis.com/memories-test-data/test_1min.mp4",  # base64 or url, video does not support base64
                        "mime_type": "video/mp4"
                    }
                ]
            }
        ],
        temperature=0.7,  # Controls randomness: 0.0-2.0, higher = more random
        max_tokens=1000,  # Maximum number of tokens to generate
        top_p=1.0,  # Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
        frequency_penalty=0.0,  # -2.0 to 2.0, reduces repetition of frequent tokens
        presence_penalty=0.0,  # -2.0 to 2.0, increases likelihood of new topics
        n=1,  # Number of completions to generate
        stream=False,  # Whether to stream the response
        stop=None,  # Stop sequences (list of strings)
        extra_body={
            "metadata": {
                "thinking_config": {
                    "thinking_budget": 1024
                },
                "response_mime_type": "application/json",  # application/json, json_schema
                "responseSchema": {
                    "type": "OBJECT",
                    "properties": {
                        "video_summary": {
                            "type": "STRING",
                            "description": "Summary of the video content from 1 second to 8 seconds."
                        },
                        "image_summary": {
                            "type": "STRING",
                            "description": "Summary of the image content."
                        }
                    },
                    "required": [
                        "video_summary",
                        "image_summary"
                    ]
                }
            }
        }
    )
    return resp

# Usage example
result = call_my_vlm()
print(result)

Response

Returns the chat completion response with structured output.
{
  "id": "resp_5810813e-99b9-427a-8736-23cf34573627",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093284,
  "status": "completed",
  "choices": [
    {
      "text": "This video demonstrates the effects of vacuum on three different states of oranges.\n\n1. **Setup**: Three orange-related items are placed in a vacuum chamber: a whole orange",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 16983,
    "output_tokens": 37,
    "total_tokens": 17020
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

Response Parameters

ParameterTypeDescription
idstringUnique identifier for the completion
objectstringObject type, always “completion”
modelstringThe model used for the completion
created_atintegerUnix timestamp of when the completion was created
statusstringStatus of the completion (e.g., “completed”)
choicesarrayArray of completion choices
choices[].textstringText content of the completion
choices[].indexintegerIndex of the choice in the choices array
usageobjectToken usage information
usage.input_tokensintegerNumber of input tokens used
usage.output_tokensintegerNumber of output tokens generated
usage.total_tokensintegerTotal number of tokens used
metaobjectMetadata about the completion
meta.providerstringProvider name (e.g., “gemini”)
meta.provider_modelstringProvider-specific model name

Authorizations

Authorization
string
header
required

Body

application/json
model
string
required

The model to use (e.g., gemini:gemini-2.5-flash)

Example:

"gemini:gemini-2.5-flash"

messages
object[]
required

Array of message objects

temperature
number
default:0.7

Controls randomness: 0.0-2.0, higher = more random

Required range: 0 <= x <= 2
max_tokens
integer
default:1000

Maximum number of tokens to generate

top_p
number
default:1

Nucleus sampling: 0.0-1.0

Required range: 0 <= x <= 1
frequency_penalty
number
default:0

Reduces repetition of frequent tokens: -2.0 to 2.0

Required range: -2 <= x <= 2
presence_penalty
number
default:0

Increases likelihood of new topics: -2.0 to 2.0

Required range: -2 <= x <= 2
n
integer
default:1

Number of completions to generate

stream
boolean
default:false

Whether to stream the response

stop

Stop sequences

extra_body
object

Response

200 - application/json

Chat completion response

id
string
required

Unique identifier for the completion

Example:

"resp_5810813e-99b9-427a-8736-23cf34573627"

object
string
required

Object type, always 'completion'

Example:

"completion"

model
string
required

The model used for the completion

Example:

"gemini:gemini-2.5-flash"

created_at
integer
required

Unix timestamp of when the completion was created

Example:

1767093284

status
string
required

Status of the completion

Example:

"completed"

choices
object[]
required
usage
object
required
meta
object
required