Gemini Video

This endpoint allows you to generate chat completions with text, video, and image inputs using Gemini VLM model.

Endpoint: POST https://mavi-backend.memories.ai/serve/api/v2/vu/chat/completionsVideo Understanding (VLM) endpoints use the /vu path prefix. Image Understanding (ILM) endpoints use /iu instead.

Supported Models

All models require the gemini: prefix when used in the model parameter (e.g., gemini:gemini-2.5-flash).

Premium Models

Model	Input Price	Output Price
gemini-3-pro-preview	$2/1M (≤200K), $4/1M (>200K)	$12/1M (≤200K), $18/1M (>200K)
gemini-2.5-pro	$1.25/1M (≤200K), $2.5/1M (>200K)	$10/1M (≤200K), $15/1M (>200K)

Flash Models (High Performance)

Model	Input Price	Output Price
gemini-3-flash-preview	$0.5/1M tokens	$3/1M tokens
gemini-2.5-flash	$0.30/1M tokens	$2.5/1M tokens
gemini-2.5-flash-preview-09-2025	$0.30/1M tokens	$2.5/1M tokens
gemini-2.0-flash	$0.1/1M tokens	$0.4/1M tokens

Lite Models (Cost-Effective)

Model	Input Price	Output Price
gemini-2.5-flash-lite	$0.1/1M tokens	$0.4/1M tokens
gemini-2.5-flash-lite-preview-09-2025	$0.1/1M tokens	$0.4/1M tokens
gemini-2.0-flash-lite	$0.075/1M tokens	$0.3/1M tokens

When using these models, remember to include the gemini: prefix in your API calls:

✅ Correct: "model": "gemini:gemini-2.5-flash"
❌ Incorrect: "model": "gemini-2.5-flash"

Request Body

Parameter	Type	Required	Default	Description
model	string	Yes	-	The model to use (e.g., `gemini:gemini-2.5-flash`)
messages	array	Yes	-	Array of message objects. Each message contains: - `role`: Role type, values: `system`, `user`, `assistant` - `content`: Message content, can be a string or array. Array items can contain: - `type`: Content type, `text` or `input_file` - `text`: Text content (when type is text) - `file_uri`: File URL or base64 encoded file (when type is input_file). Note: video does not support base64 - `mime_type`: MIME type of the file (e.g., image/jpeg, video/mp4)
temperature	number	No	0.7	Controls randomness: 0.0-2.0, higher = more random
max_tokens	integer	No	1000	Maximum number of tokens to generate
top_p	number	No	1.0	Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
frequency_penalty	number	No	0.0	Reduces repetition of frequent tokens: -2.0 to 2.0
presence_penalty	number	No	0.0	Increases likelihood of new topics: -2.0 to 2.0
n	integer	No	1	Number of completions to generate
stream	boolean	No	false	Whether to stream the response
stop	string \| array \| null	No	null	Stop sequences. Can be a string, array of strings, or null
extra_body	object	No	-	Additional body parameters. Contains: - `metadata`: Metadata object - `thinking_config`: Thinking configuration - `thinking_budget`: Integer value for thinking budget - `response_mime_type`: Response MIME type (`application/json` or `json_schema`) - `responseSchema`: JSON schema object for structured output

Code Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-mai-this_a_test_string_please_use_your_generated_key_during_testing",
    base_url="https://mavi-backend.memories.ai/serve/api/v2/vu"
)

def call_my_vlm():
    resp = client.chat.completions.create(
        model="gemini:gemini-2.5-flash",  # e.g. gemini:gemini-3-flash-preview or gemini:gemini-2.5-flash
        messages=[
            {"role": "system", "content": "You are a multimodal assistant. Keep your answers concise."},
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please summarize the content of this video and image"
                    },
                    {
                        "type": "input_file",
                        "file_uri": "https://storage.googleapis.com/memories-test-data/gun5.png",  # base64 or url, video does not support base64
                        "mime_type": "image/jpeg"
                    },
                    {
                        "type": "input_file",
                        "file_uri": "https://storage.googleapis.com/memories-test-data/test_1min.mp4",  # base64 or url, video does not support base64
                        "mime_type": "video/mp4"
                    }
                ]
            }
        ],
        temperature=0.7,  # Controls randomness: 0.0-2.0, higher = more random
        max_tokens=1000,  # Maximum number of tokens to generate
        top_p=1.0,  # Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
        frequency_penalty=0.0,  # -2.0 to 2.0, reduces repetition of frequent tokens
        presence_penalty=0.0,  # -2.0 to 2.0, increases likelihood of new topics
        n=1,  # Number of completions to generate
        stream=False,  # Whether to stream the response
        stop=None,  # Stop sequences (list of strings)
        extra_body={
            "metadata": {
                "thinking_config": {
                    "thinking_budget": 1024
                },
                "response_mime_type": "application/json",  # application/json, json_schema
                "responseSchema": {
                    "type": "OBJECT",
                    "properties": {
                        "video_summary": {
                            "type": "STRING",
                            "description": "Summary of the video content from 1 second to 8 seconds."
                        },
                        "image_summary": {
                            "type": "STRING",
                            "description": "Summary of the image content."
                        }
                    },
                    "required": [
                        "video_summary",
                        "image_summary"
                    ]
                }
            }
        }
    )
    return resp

# Usage example
result = call_my_vlm()
print(result)

Response

Returns the chat completion response with structured output.

{
  "id": "resp_5810813e-99b9-427a-8736-23cf34573627",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093284,
  "status": "completed",
  "choices": [
    {
      "text": "This video demonstrates the effects of vacuum on three different states of oranges.\n\n1. **Setup**: Three orange-related items are placed in a vacuum chamber: a whole orange",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 16983,
    "output_tokens": 37,
    "total_tokens": 17020
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

Response Parameters

Parameter	Type	Description
id	string	Unique identifier for the completion
object	string	Object type, always “completion”
model	string	The model used for the completion
created_at	integer	Unix timestamp of when the completion was created
status	string	Status of the completion (e.g., “completed”)
choices	array	Array of completion choices
choices[].text	string	Text content of the completion
choices[].index	integer	Index of the choice in the choices array
usage	object	Token usage information
usage.input_tokens	integer	Number of input tokens used
usage.output_tokens	integer	Number of output tokens generated
usage.total_tokens	integer	Total number of tokens used
meta	object	Metadata about the completion
meta.provider	string	Provider name (e.g., “gemini”)
meta.provider_model	string	Provider-specific model name

Authorizations

Authorization

string

header

required

Body

application/json

model

string

required

The model to use (e.g., gemini:gemini-2.5-flash)

Example:

"gemini:gemini-2.5-flash"

messages

object[]

required

Array of message objects

Show child attributes

temperature

number

default:0.7

Controls randomness: 0.0-2.0, higher = more random

Required range: 0 <= x <= 2

max_tokens

integer

default:1000

Maximum number of tokens to generate

top_p

number

default:1

Nucleus sampling: 0.0-1.0

Required range: 0 <= x <= 1

frequency_penalty

number

default:0

Reduces repetition of frequent tokens: -2.0 to 2.0

Required range: -2 <= x <= 2

presence_penalty

number

default:0

Increases likelihood of new topics: -2.0 to 2.0

Required range: -2 <= x <= 2

integer

default:1

Number of completions to generate

stream

boolean

default:false

Whether to stream the response

stop

Stop sequences

extra_body

object

Show child attributes

Response

200 - application/json

Chat completion response

string

required

Unique identifier for the completion

Example:

"resp_5810813e-99b9-427a-8736-23cf34573627"

object

string

required

Object type, always 'completion'

Example:

"completion"

model

string

required

The model used for the completion

Example:

"gemini:gemini-2.5-flash"

created_at

integer

required

Unix timestamp of when the completion was created

Example:

1767093284

status

string

required

Status of the completion

Example:

"completed"

choices

object[]

required

Show child attributes

usage

object

required

Show child attributes

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Supported Models

Premium Models

Flash Models (High Performance)

Lite Models (Cost-Effective)

Request Body

Code Example

Response

Response Parameters

Authorizations

Body

Response

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Documentation Index

​Supported Models

​Premium Models

​Flash Models (High Performance)

​Lite Models (Cost-Effective)

​Request Body

​Code Example

​Response

​Response Parameters

Authorizations

Body

Response

Supported Models

Premium Models

Flash Models (High Performance)

Lite Models (Cost-Effective)

Request Body

Code Example

Response

Response Parameters