Generate chat completions using GPT ILM model with image inputs.
This endpoint allows you to generate chat completions with image inputs using GPT ILM model.Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST https://mavi-backend.memories.ai/serve/api/v2/iu/chat/completionsImage Understanding (ILM) endpoints use the /iu path prefix. Video Understanding (VLM) endpoints use /vu instead.gpt: prefix when used in the model parameter (e.g., gpt:gpt-4o).
| Model | Input Price | Output Price |
|---|---|---|
| gpt-5.2 | $1.75/1M tokens | $14/1M tokens |
| gpt-5.2-2025-12-11 | $1.75/1M tokens | $14/1M tokens |
| gpt-5.2-chat-latest | $1.75/1M tokens | $14/1M tokens |
| gpt-5.1 | $1.25/1M tokens | $10/1M tokens |
| gpt-5.1-2025-11-13 | $1.25/1M tokens | $10/1M tokens |
| gpt-5.1-chat-latest | $1.25/1M tokens | $10/1M tokens |
| gpt-5 | $1.25/1M tokens | $10/1M tokens |
| gpt-5-2025-08-07 | $1.25/1M tokens | $10/1M tokens |
| gpt-5-chat-latest | $1.25/1M tokens | $10/1M tokens |
| gpt-5-mini | $0.25/1M tokens | $2/1M tokens |
| gpt-5-mini-2025-08-07 | $0.25/1M tokens | $2/1M tokens |
| gpt-5-nano | $0.05/1M tokens | $0.4/1M tokens |
| gpt-5-nano-2025-08-07 | $0.05/1M tokens | $0.4/1M tokens |
| Model | Input Price | Output Price |
|---|---|---|
| o1 | $15/1M tokens | $60/1M tokens |
| o1-2024-12-17 | $15/1M tokens | $60/1M tokens |
| o3 | $2/1M tokens | $8/1M tokens |
| o3-2025-04-16 | $2/1M tokens | $8/1M tokens |
| o4-mini | $1.1/1M tokens | $4.4/1M tokens |
| o4-mini-2025-04-16 | $1.1/1M tokens | $4.4/1M tokens |
| Model | Input Price | Output Price |
|---|---|---|
| gpt-4.1 | $2/1M tokens | $8/1M tokens |
| gpt-4.1-2025-04-14 | $2/1M tokens | $8/1M tokens |
| gpt-4.1-mini | $0.4/1M tokens | $1.6/1M tokens |
| gpt-4.1-mini-2025-04-14 | $0.4/1M tokens | $1.6/1M tokens |
| gpt-4.1-nano | $0.1/1M tokens | $0.4/1M tokens |
| gpt-4.1-nano-2025-04-14 | $0.1/1M tokens | $0.4/1M tokens |
| Model | Input Price | Output Price |
|---|---|---|
| gpt-4o | $2.5/1M tokens | $10/1M tokens |
| gpt-4o-2024-08-06 | $2.5/1M tokens | $10/1M tokens |
| gpt-4o-2024-05-13 | $5/1M tokens | $15/1M tokens |
| chatgpt-4o-latest | $5/1M tokens | $15/1M tokens |
| gpt-4o-mini | $0.15/1M tokens | $0.60/1M tokens |
| gpt-4o-mini-2024-07-18 | $0.15/1M tokens | $0.60/1M tokens |
| Model | Input Price | Output Price |
|---|---|---|
| gpt-4-turbo | $10/1M tokens | $30/1M tokens |
| gpt-4-turbo-2024-04-09 | $10/1M tokens | $30/1M tokens |
gpt: prefix: "model": "gpt:gpt-4o"| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | The model to use (e.g., gpt:gpt-4o) |
| messages | array | Yes | - | Array of message objects. Each message contains: - role: Role type, values: system, user, assistant- content: Message content, can be a string or array. Array items can contain:- type: Content type, text or image_url- text: Text content (when type is text)- image_url: Image object (when type is image_url)- url: Image URL or base64 encoded image |
| response_format | object | No | - | Force JSON output format. Contains type field with value json_object |
| temperature | number | No | 0 | Controls randomness: 0.0-2.0, 0 = deterministic |
| max_tokens | integer | No | 1000 | Maximum number of tokens to generate |
| top_p | number | No | 1.0 | Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass |
| frequency_penalty | number | No | 0.0 | Reduces repetition of frequent tokens: -2.0 to 2.0 |
| presence_penalty | number | No | 0.0 | Increases likelihood of new topics: -2.0 to 2.0 |
| n | integer | No | 1 | Number of completions to generate |
| stream | boolean | No | false | Whether to stream the response |
| stop | string | array | null | No | null | Stop sequences. Can be a string, array of strings, or null |
| Parameter | Type | Description |
|---|---|---|
| id | string | Unique identifier for the chat completion |
| object | string | Object type, always “chat.completion” |
| created | integer | Unix timestamp of when the completion was created |
| model | string | The model used for the completion |
| choices | array | Array of completion choices |
| choices[].index | integer | Index of the choice in the choices array |
| choices[].message | object | Message object containing the assistant’s response |
| choices[].message.role | string | Role of the message, always “assistant” |
| choices[].message.content | string | Content of the message |
| choices[].message.refusal | string | null | Refusal message if the request was refused |
| choices[].message.annotations | array | Annotations for the message |
| choices[].logprobs | object | null | Log probability information |
| choices[].finish_reason | string | Reason why the completion finished |
| usage | object | Token usage information |
| usage.prompt_tokens | integer | Number of tokens in the prompt |
| usage.completion_tokens | integer | Number of tokens in the completion |
| usage.total_tokens | integer | Total number of tokens used |
| usage.prompt_tokens_details | object | Detailed prompt token information |
| usage.prompt_tokens_details.cached_tokens | integer | Number of cached tokens |
| usage.prompt_tokens_details.audio_tokens | integer | Number of audio tokens |
| usage.completion_tokens_details | object | Detailed completion token information |
| usage.completion_tokens_details.reasoning_tokens | integer | Number of reasoning tokens |
| usage.completion_tokens_details.audio_tokens | integer | Number of audio tokens |
| usage.completion_tokens_details.accepted_prediction_tokens | integer | Number of accepted prediction tokens |
| usage.completion_tokens_details.rejected_prediction_tokens | integer | Number of rejected prediction tokens |
| service_tier | string | The service tier used for the request |
| system_fingerprint | string | System fingerprint for the model version |
The model to use (e.g., gpt:gpt-4o)
"gpt:gpt-4o"
Array of message objects
Force JSON output format
Controls randomness: 0.0-2.0, 0 = deterministic
0 <= x <= 2Maximum number of tokens to generate
Nucleus sampling: 0.0-1.0
0 <= x <= 1Reduces repetition of frequent tokens: -2.0 to 2.0
-2 <= x <= 2Increases likelihood of new topics: -2.0 to 2.0
-2 <= x <= 2Number of completions to generate
Whether to stream the response
Stop sequences
Chat completion response with JSON format
"chatcmpl-CsRhYgDaLSNjl80v5uYBufEDbJqAM"
"chat.completion"
1767092232
The model used for the completion
"gpt-4o-2024-08-06"
The service tier used for the request
"default"
System fingerprint for the model version
"fp_deacdd5f6f"