Generate chat completions using Qwen VLM model with text, video, and image inputs.
This endpoint allows you to generate chat completions with text, video, and image inputs using Qwen VLM model.Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST https://mavi-backend.memories.ai/serve/api/v2/vu/chat/completionsVideo Understanding (VLM) endpoints use the /vu path prefix. Image Understanding (ILM) endpoints use /iu instead.qwen: prefix when used in the model parameter (e.g., qwen:qwen3-vl-plus).
| Model | Input Price | Output Price |
|---|---|---|
| qwen3-vl-plus | $0.00021/1K (≤32K), $0.00031/1K (32K-128K), $0.00063/1K (128K-256K) | $0.00168/1K (≤32K), $0.00252/1K (32K-128K), $0.00503/1K (128K-256K) |
| qwen3-vl-plus-2025-12-19 | $0.00021/1K (≤32K), $0.00031/1K (32K-128K), $0.00063/1K (128K-256K) | $0.00168/1K (≤32K), $0.00252/1K (32K-128K), $0.00503/1K (128K-256K) |
| qwen3-vl-plus-2025-09-23 | $0.00021/1K (≤32K), $0.00031/1K (32K-128K), $0.00063/1K (128K-256K) | $0.00168/1K (≤32K), $0.00252/1K (32K-128K), $0.00503/1K (128K-256K) |
| Model | Input Price | Output Price |
|---|---|---|
| qwen3-vl-flash | $0.000052/1K (≤32K), $0.000079/1K (32K-128K), $0.000126/1K (128K-256K) | $0.00042/1K (≤32K), $0.00063/1K (32K-128K), $0.00101/1K (128K-256K) |
| qwen3-vl-flash-2025-10-15 | $0.000052/1K (≤32K), $0.000079/1K (32K-128K), $0.000126/1K (128K-256K) | $0.00042/1K (≤32K), $0.00063/1K (32K-128K), $0.00101/1K (128K-256K) |
| Model | Input Price | Output Price |
|---|---|---|
| qwen-vl-max | $0.00084/1K tokens | $0.00335/1K tokens |
| qwen-vl-max-latest | $0.00084/1K tokens | $0.00335/1K tokens |
| qwen-vl-max-2025-08-13 | $0.00084/1K tokens | $0.00335/1K tokens |
| qwen-vl-max-2025-04-08 | $0.00084/1K tokens | $0.00335/1K tokens |
| Model | Input Price | Output Price |
|---|---|---|
| qwen-vl-plus | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-latest | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-2025-08-15 | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-2025-05-07 | $0.00022/1K tokens | $0.00066/1K tokens |
| qwen-vl-plus-2025-01-25 | $0.00022/1K tokens | $0.00066/1K tokens |
qwen: prefix: "model": "qwen:qwen3-vl-plus"| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | - | The model to use (e.g., qwen:qwen3-vl-plus) |
| messages | array | Yes | - | Array of message objects. Each message contains: - role: Role type, values: system, user- content: Message content, can be a string or array. Array items can contain:- image: Image URL or base64 encoded image- video: Video URL only (video does not support base64)- text: Text content |
| temperature | number | No | 0.7 | Controls randomness: 0.0-2.0, higher = more random |
| max_tokens | integer | No | 1024 | Maximum number of tokens to generate |
| top_p | number | No | 0.9 | Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass |
| frequency_penalty | number | No | 0.0 | Reduces repetition of frequent tokens: -2.0 to 2.0 |
| presence_penalty | number | No | 0.0 | Increases likelihood of new topics: -2.0 to 2.0 |
| n | integer | No | 1 | Number of completions to generate |
| stream | boolean | No | false | Whether to stream the response |
| stop | string | array | null | No | null | Stop sequences. Can be a string, array of strings, or null |
| extra_body | object | No | - | Additional body parameters. Contains: - metadata: Metadata object- enable_thinking: Boolean to enable thinking mode- thinking_budget: Integer value for thinking budget- response_format: Response format configuration- type: Format type (json_schema)- json_schema: JSON schema object- name: Schema name- schema: JSON schema definition |
| Parameter | Type | Description |
|---|---|---|
| id | string | Unique identifier for the completion |
| object | string | Object type, always “completion” |
| model | string | The model used for the completion |
| created_at | integer | Unix timestamp of when the completion was created |
| status | string | Status of the completion (e.g., “completed”) |
| choices | array | Array of completion choices |
| choices[].text | string | Text content of the completion |
| choices[].index | integer | Index of the choice in the choices array |
| usage | object | Token usage information |
| usage.input_tokens | integer | Number of input tokens used |
| usage.output_tokens | integer | Number of output tokens generated |
| usage.total_tokens | integer | Total number of tokens used |
| meta | object | Metadata about the completion |
| meta.provider | string | Provider name (e.g., “qwen”) |
| meta.provider_model | string | Provider-specific model name |
The model to use (e.g., qwen:qwen3-vl-plus)
"qwen:qwen3-vl-plus"
Array of message objects
Controls randomness: 0.0-2.0, higher = more random
0 <= x <= 2Maximum number of tokens to generate
Nucleus sampling: 0.0-1.0
0 <= x <= 1Reduces repetition of frequent tokens: -2.0 to 2.0
-2 <= x <= 2Increases likelihood of new topics: -2.0 to 2.0
-2 <= x <= 2Number of completions to generate
Whether to stream the response
Stop sequences
Chat completion response with structured output
Unique identifier for the completion
"9212f247-2372-95b3-8c9b-968368a952fc"
Object type, always 'completion'
"completion"
The model used for the completion
"qwen3-vl-plus"
Unix timestamp of when the completion was created
1767095035
Status of the completion
"completed"