Generate chat completions using Gemini ILM model with image inputs.
POST https://mavi-backend.memories.ai/serve/api/v2/iu/chat/completionsImage Understanding (ILM) endpoints use the /iu path prefix. Video Understanding (VLM) endpoints use /vu instead.gemini: prefix when used in the model parameter (e.g., gemini:gemini-2.5-flash).
| Model | Input Price | Output Price |
|---|---|---|
| gemini-3-pro-preview | $2/1M (≤200K), $4/1M (>200K) | $12/1M (≤200K), $18/1M (>200K) |
| gemini-2.5-pro | $1.25/1M (≤200K), $2.5/1M (>200K) | $10/1M (≤200K), $15/1M (>200K) |
| Model | Input Price | Output Price |
|---|---|---|
| gemini-3-flash-preview | $0.5/1M tokens | $3/1M tokens |
| gemini-2.5-flash | $0.30/1M tokens | $2.5/1M tokens |
| gemini-2.5-flash-preview-09-2025 | $0.30/1M tokens | $2.5/1M tokens |
| gemini-2.0-flash | $0.1/1M tokens | $0.4/1M tokens |
| Model | Input Price | Output Price |
|---|---|---|
| gemini-2.5-flash-lite | $0.1/1M tokens | $0.4/1M tokens |
| gemini-2.5-flash-lite-preview-09-2025 | $0.1/1M tokens | $0.4/1M tokens |
| gemini-2.0-flash-lite | $0.075/1M tokens | $0.3/1M tokens |
gemini: prefix in your API calls:"model": "gemini:gemini-2.5-flash""model": "gemini-2.5-flash"| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | The model to use (e.g., gemini:gemini-2.5-flash) |
| messages | array | Yes | - | Array of message objects. Each message contains: - role: Role type, values: system, user, assistant- content: Message content, can be a string or array. Array items can contain:- type: Content type, text or input_file- text: Text content (when type is text)- file_uri: File URL or base64 encoded file (when type is input_file)- mime_type: MIME type of the file (e.g., image/jpeg, video/mp4) |
| temperature | number | No | 0.7 | Controls randomness: 0.0-2.0, higher = more random |
| max_tokens | integer | No | 1000 | Maximum number of tokens to generate |
| top_p | number | No | 1.0 | Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass |
| frequency_penalty | number | No | 0.0 | Reduces repetition of frequent tokens: -2.0 to 2.0 |
| presence_penalty | number | No | 0.0 | Increases likelihood of new topics: -2.0 to 2.0 |
| n | integer | No | 1 | Number of completions to generate |
| stream | boolean | No | false | Whether to stream the response |
| stop | string | array | null | No | null | Stop sequences. Can be a string, array of strings, or null |
| extra_body | object | No | - | Additional body parameters. Contains: - metadata: Metadata object- thinking_config: Thinking configuration- thinking_budget: Integer value for thinking budget- response_mime_type: Response MIME type (application/json or json_schema)- responseSchema: JSON schema object for structured output |
| Parameter | Type | Description |
|---|---|---|
| id | string | Unique identifier for the completion |
| object | string | Object type, always “completion” |
| model | string | The model used for the completion |
| created_at | integer | Unix timestamp of when the completion was created |
| status | string | Status of the completion (e.g., “completed”) |
| choices | array | Array of completion choices |
| choices[].text | string | Text content of the completion |
| choices[].index | integer | Index of the choice in the choices array |
| usage | object | Token usage information |
| usage.input_tokens | integer | Number of input tokens used |
| usage.output_tokens | integer | Number of output tokens generated |
| usage.total_tokens | integer | Total number of tokens used |
| meta | object | Metadata about the completion |
| meta.provider | string | Provider name (e.g., “gemini”) |
| meta.provider_model | string | Provider-specific model name |
The model to use (e.g., gemini:gemini-2.5-flash)
"gemini:gemini-2.5-flash"
Array of message objects
Controls randomness: 0.0-2.0, higher = more random
0 <= x <= 2Maximum number of tokens to generate
Nucleus sampling: 0.0-1.0
0 <= x <= 1Reduces repetition of frequent tokens: -2.0 to 2.0
-2 <= x <= 2Increases likelihood of new topics: -2.0 to 2.0
-2 <= x <= 2Number of completions to generate
Whether to stream the response
Stop sequences
Chat completion response
Unique identifier for the completion
"resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43"
Object type, always 'completion'
"completion"
The model used for the completion
"gemini:gemini-2.5-flash"
Unix timestamp of when the completion was created
1767093024
Status of the completion
"completed"