Gemini ILM Chat Completions

Chat Completions Gemini ILM

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/ilm/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini:gemini-2.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n": 1,
  "stream": false,
  "stop": "<string>",
  "extra_body": {
    "metadata": {
      "thinking_config": {
        "thinking_budget": 123
      },
      "response_mime_type": "application/json",
      "responseSchema": {
        "type": "OBJECT",
        "properties": {},
        "required": [
          "<string>"
        ]
      }
    }
  }
}
'

{
  "id": "resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093024,
  "status": "completed",
  "choices": [
    {
      "text": "This image shows a humorous scene presented from a first-person perspective (FPS).\n\n**Main Scene:**\n*   In the center of the frame, both hands are holding weapons",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 1812,
    "output_tokens": 38,
    "total_tokens": 1850
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

POST

ilm

chat

completions

Chat Completions Gemini ILM

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/ilm/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini:gemini-2.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n": 1,
  "stream": false,
  "stop": "<string>",
  "extra_body": {
    "metadata": {
      "thinking_config": {
        "thinking_budget": 123
      },
      "response_mime_type": "application/json",
      "responseSchema": {
        "type": "OBJECT",
        "properties": {},
        "required": [
          "<string>"
        ]
      }
    }
  }
}
'

{
  "id": "resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093024,
  "status": "completed",
  "choices": [
    {
      "text": "This image shows a humorous scene presented from a first-person perspective (FPS).\n\n**Main Scene:**\n*   In the center of the frame, both hands are holding weapons",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 1812,
    "output_tokens": 38,
    "total_tokens": 1850
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

This endpoint allows you to generate chat completions with image inputs using Gemini ILM model.

Request Body

Parameter	Type	Required	Default	Description
model	string	Yes	-	The model to use (e.g., `gemini:gemini-2.5-flash`)
messages	array	Yes	-	Array of message objects. Each message contains: - `role`: Role type, values: `system`, `user`, `assistant` - `content`: Message content, can be a string or array. Array items can contain: - `type`: Content type, `text` or `input_file` - `text`: Text content (when type is text) - `file_uri`: File URL or base64 encoded file (when type is input_file) - `mime_type`: MIME type of the file (e.g., image/jpeg, video/mp4)
temperature	number	No	0.7	Controls randomness: 0.0-2.0, higher = more random
max_tokens	integer	No	1000	Maximum number of tokens to generate
top_p	number	No	1.0	Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
frequency_penalty	number	No	0.0	Reduces repetition of frequent tokens: -2.0 to 2.0
presence_penalty	number	No	0.0	Increases likelihood of new topics: -2.0 to 2.0
n	integer	No	1	Number of completions to generate
stream	boolean	No	false	Whether to stream the response
stop	string \| array \| null	No	null	Stop sequences. Can be a string, array of strings, or null
extra_body	object	No	-	Additional body parameters. Contains: - `metadata`: Metadata object - `thinking_config`: Thinking configuration - `thinking_budget`: Integer value for thinking budget - `response_mime_type`: Response MIME type (`application/json` or `json_schema`) - `responseSchema`: JSON schema object for structured output

Code Example

from openai import OpenAI

client = OpenAI(
    api_key="2cfb0d30fe04a784362ffdbc054ba859",
    base_url="https://mavi-backend.memories.ai/serve/api/v2/ilm"
)

def call_my_vlm():
    resp = client.chat.completions.create(
        model="gemini:gemini-2.5-flash",  # or qwen:vl-30b-a3b-instruct / gemini:gemini-2.5-flash
        messages=[
            {"role": "system", "content": "You are a multimodal assistant. Keep your answers concise."},
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please summarize the content of this video and image"
                    },
                    {
                        "type": "input_file",
                        "file_uri": "https://storage.googleapis.com/memories-test-data/gun5.png",  # base64 or url
                        "mime_type": "image/jpeg"
                    }
                ]
            }
        ],
        temperature=0.7,  # Controls randomness: 0.0-2.0, higher = more random
        max_tokens=1000,  # Maximum number of tokens to generate
        top_p=1.0,  # Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
        frequency_penalty=0.0,  # -2.0 to 2.0, reduces repetition of frequent tokens
        presence_penalty=0.0,  # -2.0 to 2.0, increases likelihood of new topics
        n=1,  # Number of completions to generate
        stream=False,  # Whether to stream the response
        stop=None,  # Stop sequences (list of strings)
        extra_body={
            "metadata": {
                "thinking_config": {
                    "thinking_budget": 1024
                },
                "response_mime_type": "application/json",  # application/json, json_schema
                "responseSchema": {
                    "type": "OBJECT",
                    "properties": {
                        "video_summary": {
                            "type": "STRING",
                            "description": "Summary of the video content from 1 second to 8 seconds."
                        },
                        "image_summary": {
                            "type": "STRING",
                            "description": "Summary of the image content."
                        }
                    },
                    "required": [
                        "video_summary",
                        "image_summary"
                    ]
                }
            }
        }
    )
    return resp

# Usage example
result = call_my_vlm()
print(result)

Response

Returns the chat completion response with structured output.

{
  "id": "resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093024,
  "status": "completed",
  "choices": [
    {
      "text": "This image shows a humorous scene presented from a first-person perspective (FPS).\n\n**Main Scene:**\n*   In the center of the frame, both hands are holding weapons",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 1812,
    "output_tokens": 38,
    "total_tokens": 1850
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

Response Parameters

Parameter	Type	Description
id	string	Unique identifier for the completion
object	string	Object type, always “completion”
model	string	The model used for the completion
created_at	integer	Unix timestamp of when the completion was created
status	string	Status of the completion (e.g., “completed”)
choices	array	Array of completion choices
choices[].text	string	Text content of the completion
choices[].index	integer	Index of the choice in the choices array
usage	object	Token usage information
usage.input_tokens	integer	Number of input tokens used
usage.output_tokens	integer	Number of output tokens generated
usage.total_tokens	integer	Total number of tokens used
meta	object	Metadata about the completion
meta.provider	string	Provider name (e.g., “gemini”)
meta.provider_model	string	Provider-specific model name

Authorizations

Authorization

string

header

required

Body

application/json

model

string

required

The model to use (e.g., gemini:gemini-2.5-flash)

Example:

"gemini:gemini-2.5-flash"

messages

object[]

required

Array of message objects

Show child attributes

temperature

number

default:0.7

Controls randomness: 0.0-2.0, higher = more random

Required range: 0 <= x <= 2

max_tokens

integer

default:1000

Maximum number of tokens to generate

top_p

number

default:1

Nucleus sampling: 0.0-1.0

Required range: 0 <= x <= 1

frequency_penalty

number

default:0

Reduces repetition of frequent tokens: -2.0 to 2.0

Required range: -2 <= x <= 2

presence_penalty

number

default:0

Increases likelihood of new topics: -2.0 to 2.0

Required range: -2 <= x <= 2

integer

default:1

Number of completions to generate

stream

boolean

default:false

Whether to stream the response

stop

Stop sequences

extra_body

object

Show child attributes

Response

200 - application/json

Chat completion response

string

required

Unique identifier for the completion

Example:

"resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43"

object

string

required

Object type, always 'completion'

Example:

"completion"

model

string

required

The model used for the completion

Example:

"gemini:gemini-2.5-flash"

created_at

integer

required

Unix timestamp of when the completion was created

Example:

1767093024

status

string

required

Status of the completion

Example:

"completed"

choices

object[]

required

Show child attributes

usage

object

required

Show child attributes

Getting Started

Base

Transcript

Video Metadata & Transcript

VLM

Embeddings

Gemini ILM Chat Completions

Request Body

Code Example

Response

Response Parameters

Authorizations

Body

Response

Getting Started

Base

Transcript

Video Metadata & Transcript

VLM

Embeddings

​Request Body

​Code Example

​Response

​Response Parameters

Authorizations

Body

Response

Request Body

Code Example

Response

Response Parameters