Generate chat completions using Gemini VLM model with text, video, and image inputs.
This endpoint allows you to generate chat completions with text, video, and image inputs using Gemini VLM model.Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST https://mavi-backend.memories.ai/serve/api/v2/vu/chat/completionsVideo Understanding (VLM) endpoints use the /vu path prefix. Image Understanding (ILM) endpoints use /iu instead.gemini: prefix when used in the model parameter (e.g., gemini:gemini-2.5-flash).
| Model | Input Price | Output Price |
|---|---|---|
| gemini-3-pro-preview | $2/1M (≤200K), $4/1M (>200K) | $12/1M (≤200K), $18/1M (>200K) |
| gemini-2.5-pro | $1.25/1M (≤200K), $2.5/1M (>200K) | $10/1M (≤200K), $15/1M (>200K) |
| Model | Input Price | Output Price |
|---|---|---|
| gemini-3-flash-preview | $0.5/1M tokens | $3/1M tokens |
| gemini-2.5-flash | $0.30/1M tokens | $2.5/1M tokens |
| gemini-2.5-flash-preview-09-2025 | $0.30/1M tokens | $2.5/1M tokens |
| gemini-2.0-flash | $0.1/1M tokens | $0.4/1M tokens |
| Model | Input Price | Output Price |
|---|---|---|
| gemini-2.5-flash-lite | $0.1/1M tokens | $0.4/1M tokens |
| gemini-2.5-flash-lite-preview-09-2025 | $0.1/1M tokens | $0.4/1M tokens |
| gemini-2.0-flash-lite | $0.075/1M tokens | $0.3/1M tokens |
gemini: prefix in your API calls:"model": "gemini:gemini-2.5-flash""model": "gemini-2.5-flash"| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | The model to use (e.g., gemini:gemini-2.5-flash) |
| messages | array | Yes | - | Array of message objects. Each message contains: - role: Role type, values: system, user, assistant- content: Message content, can be a string or array. Array items can contain:- type: Content type, text or input_file- text: Text content (when type is text)- file_uri: File URL or base64 encoded file (when type is input_file). Note: video does not support base64- mime_type: MIME type of the file (e.g., image/jpeg, video/mp4) |
| temperature | number | No | 0.7 | Controls randomness: 0.0-2.0, higher = more random |
| max_tokens | integer | No | 1000 | Maximum number of tokens to generate |
| top_p | number | No | 1.0 | Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass |
| frequency_penalty | number | No | 0.0 | Reduces repetition of frequent tokens: -2.0 to 2.0 |
| presence_penalty | number | No | 0.0 | Increases likelihood of new topics: -2.0 to 2.0 |
| n | integer | No | 1 | Number of completions to generate |
| stream | boolean | No | false | Whether to stream the response |
| stop | string | array | null | No | null | Stop sequences. Can be a string, array of strings, or null |
| extra_body | object | No | - | Additional body parameters. Contains: - metadata: Metadata object- thinking_config: Thinking configuration- thinking_budget: Integer value for thinking budget- response_mime_type: Response MIME type (application/json or json_schema)- responseSchema: JSON schema object for structured output |
| Parameter | Type | Description |
|---|---|---|
| id | string | Unique identifier for the completion |
| object | string | Object type, always “completion” |
| model | string | The model used for the completion |
| created_at | integer | Unix timestamp of when the completion was created |
| status | string | Status of the completion (e.g., “completed”) |
| choices | array | Array of completion choices |
| choices[].text | string | Text content of the completion |
| choices[].index | integer | Index of the choice in the choices array |
| usage | object | Token usage information |
| usage.input_tokens | integer | Number of input tokens used |
| usage.output_tokens | integer | Number of output tokens generated |
| usage.total_tokens | integer | Total number of tokens used |
| meta | object | Metadata about the completion |
| meta.provider | string | Provider name (e.g., “gemini”) |
| meta.provider_model | string | Provider-specific model name |
The model to use (e.g., gemini:gemini-2.5-flash)
"gemini:gemini-2.5-flash"
Array of message objects
Controls randomness: 0.0-2.0, higher = more random
0 <= x <= 2Maximum number of tokens to generate
Nucleus sampling: 0.0-1.0
0 <= x <= 1Reduces repetition of frequent tokens: -2.0 to 2.0
-2 <= x <= 2Increases likelihood of new topics: -2.0 to 2.0
-2 <= x <= 2Number of completions to generate
Whether to stream the response
Stop sequences
Chat completion response
Unique identifier for the completion
"resp_5810813e-99b9-427a-8736-23cf34573627"
Object type, always 'completion'
"completion"
The model used for the completion
"gemini:gemini-2.5-flash"
Unix timestamp of when the completion was created
1767093284
Status of the completion
"completed"