Generate chat completions using Qwen ILM model with image inputs.
POST https://mavi-backend.memories.ai/serve/api/v2/iu/chat/completionsImage Understanding (ILM) endpoints use the /iu path prefix. Video Understanding (VLM) endpoints use /vu instead.qwen: prefix when used in the model parameter (e.g., qwen:qwen3-vl-plus).
| Model | Input Price | Output Price |
|---|---|---|
| qwen3-vl-plus | $0.00021/1K (≤32K), $0.00031/1K (32K-128K), $0.00063/1K (128K-256K) | $0.00168/1K (≤32K), $0.00252/1K (32K-128K), $0.00503/1K (128K-256K) |
| qwen3-vl-plus-2025-12-19 | $0.00021/1K (≤32K), $0.00031/1K (32K-128K), $0.00063/1K (128K-256K) | $0.00168/1K (≤32K), $0.00252/1K (32K-128K), $0.00503/1K (128K-256K) |
| qwen3-vl-plus-2025-09-23 | $0.00021/1K (≤32K), $0.00031/1K (32K-128K), $0.00063/1K (128K-256K) | $0.00168/1K (≤32K), $0.00252/1K (32K-128K), $0.00503/1K (128K-256K) |
| Model | Input Price | Output Price |
|---|---|---|
| qwen3-vl-flash | $0.000052/1K (≤32K), $0.000079/1K (32K-128K), $0.000126/1K (128K-256K) | $0.00042/1K (≤32K), $0.00063/1K (32K-128K), $0.00101/1K (128K-256K) |
| qwen3-vl-flash-2025-10-15 | $0.000052/1K (≤32K), $0.000079/1K (32K-128K), $0.000126/1K (128K-256K) | $0.00042/1K (≤32K), $0.00063/1K (32K-128K), $0.00101/1K (128K-256K) |
| Model | Input Price | Output Price |
|---|---|---|
| qwen-vl-max | $0.00084/1K tokens | $0.00335/1K tokens |
| qwen-vl-max-latest | $0.00084/1K tokens | $0.00335/1K tokens |
| qwen-vl-max-2025-08-13 | $0.00084/1K tokens | $0.00335/1K tokens |
| qwen-vl-max-2025-04-08 | $0.00084/1K tokens | $0.00335/1K tokens |
| Model | Input Price | Output Price |
|---|---|---|
| qwen-vl-plus | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-latest | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-2025-08-15 | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-2025-05-07 | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-2025-01-25 | $0.00022/1K tokens | $0.00066/1K tokens |
qwen: prefix: "model": "qwen:qwen3-vl-plus"| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | The model to use (e.g., qwen:qwen3-vl-plus) |
| messages | array | Yes | - | Array of message objects. Each message contains: - role: Role type, values: system, user- content: Message content, can be a string or array. Array items can contain:- image: Image URL or base64 encoded image- text: Text content |
| temperature | number | No | 0.7 | Controls randomness: 0.0-2.0, higher = more random |
| max_tokens | integer | No | 1024 | Maximum number of tokens to generate |
| top_p | number | No | 0.9 | Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass |
| frequency_penalty | number | No | 0.0 | Reduces repetition of frequent tokens: -2.0 to 2.0 |
| presence_penalty | number | No | 0.0 | Increases likelihood of new topics: -2.0 to 2.0 |
| n | integer | No | 1 | Number of completions to generate |
| stream | boolean | No | false | Whether to stream the response |
| stop | string | array | null | No | null | Stop sequences. Can be a string, array of strings, or null |
| extra_body | object | No | - | Additional body parameters. Contains: - metadata: Metadata object- enable_thinking: Boolean to enable thinking mode- thinking_budget: Integer value for thinking budget- response_format: Response format configuration- type: Format type (json_schema)- json_schema: JSON schema object- name: Schema name- schema: JSON schema definition |
| Parameter | Type | Description |
|---|---|---|
| id | string | Unique identifier for the completion |
| object | string | Object type, always “completion” |
| model | string | The model used for the completion |
| created_at | integer | Unix timestamp of when the completion was created |
| status | string | Status of the completion (e.g., “completed”) |
| choices | array | Array of completion choices |
| choices[].text | string | Text content of the completion |
| choices[].index | integer | Index of the choice in the choices array |
| usage | object | Token usage information |
| usage.input_tokens | integer | Number of input tokens used |
| usage.output_tokens | integer | Number of output tokens generated |
| usage.total_tokens | integer | Total number of tokens used |
| meta | object | Metadata about the completion |
| meta.provider | string | Provider name (e.g., “qwen”) |
| meta.provider_model | string | Provider-specific model name |
The model to use (e.g., qwen:qwen3-vl-plus)
"qwen:qwen3-vl-plus"
Array of message objects
Controls randomness: 0.0-2.0, higher = more random
0 <= x <= 2Maximum number of tokens to generate
Nucleus sampling: 0.0-1.0
0 <= x <= 1Reduces repetition of frequent tokens: -2.0 to 2.0
-2 <= x <= 2Increases likelihood of new topics: -2.0 to 2.0
-2 <= x <= 2Number of completions to generate
Whether to stream the response
Stop sequences
Chat completion response with structured output
Unique identifier for the completion
"6612267a-d08f-9ea0-a254-7bcea6339f49"
Object type, always 'completion'
"completion"
The model used for the completion
"qwen3-vl-plus"
Unix timestamp of when the completion was created
1767094289
Status of the completion
"completed"