POST
/
ilm
/
chat
/
completions
Chat Completions Gemini ILM
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/ilm/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini:gemini-2.5-flash",
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n": 1,
  "stream": false,
  "stop": "<string>",
  "extra_body": {
    "metadata": {
      "thinking_config": {
        "thinking_budget": 123
      },
      "response_mime_type": "application/json",
      "responseSchema": {
        "type": "OBJECT",
        "properties": {},
        "required": [
          "<string>"
        ]
      }
    }
  }
}
'
{
  "id": "resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093024,
  "status": "completed",
  "choices": [
    {
      "text": "This image shows a humorous scene presented from a first-person perspective (FPS).\n\n**Main Scene:**\n*   In the center of the frame, both hands are holding weapons",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 1812,
    "output_tokens": 38,
    "total_tokens": 1850
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}
This endpoint allows you to generate chat completions with image inputs using Gemini ILM model.

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYes-The model to use (e.g., gemini:gemini-2.5-flash)
messagesarrayYes-Array of message objects. Each message contains:
- role: Role type, values: system, user, assistant
- content: Message content, can be a string or array. Array items can contain:
- type: Content type, text or input_file
- text: Text content (when type is text)
- file_uri: File URL or base64 encoded file (when type is input_file)
- mime_type: MIME type of the file (e.g., image/jpeg, video/mp4)
temperaturenumberNo0.7Controls randomness: 0.0-2.0, higher = more random
max_tokensintegerNo1000Maximum number of tokens to generate
top_pnumberNo1.0Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
frequency_penaltynumberNo0.0Reduces repetition of frequent tokens: -2.0 to 2.0
presence_penaltynumberNo0.0Increases likelihood of new topics: -2.0 to 2.0
nintegerNo1Number of completions to generate
streambooleanNofalseWhether to stream the response
stopstring | array | nullNonullStop sequences. Can be a string, array of strings, or null
extra_bodyobjectNo-Additional body parameters. Contains:
- metadata: Metadata object
- thinking_config: Thinking configuration
- thinking_budget: Integer value for thinking budget
- response_mime_type: Response MIME type (application/json or json_schema)
- responseSchema: JSON schema object for structured output

Code Example

from openai import OpenAI

client = OpenAI(
    api_key="2cfb0d30fe04a784362ffdbc054ba859",
    base_url="https://mavi-backend.memories.ai/serve/api/v2/ilm"
)

def call_my_vlm():
    resp = client.chat.completions.create(
        model="gemini:gemini-2.5-flash",  # or qwen:vl-30b-a3b-instruct / gemini:gemini-2.5-flash
        messages=[
            {"role": "system", "content": "You are a multimodal assistant. Keep your answers concise."},
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please summarize the content of this video and image"
                    },
                    {
                        "type": "input_file",
                        "file_uri": "https://storage.googleapis.com/memories-test-data/gun5.png",  # base64 or url
                        "mime_type": "image/jpeg"
                    }
                ]
            }
        ],
        temperature=0.7,  # Controls randomness: 0.0-2.0, higher = more random
        max_tokens=1000,  # Maximum number of tokens to generate
        top_p=1.0,  # Nucleus sampling: 0.0-1.0, consider tokens with top_p probability mass
        frequency_penalty=0.0,  # -2.0 to 2.0, reduces repetition of frequent tokens
        presence_penalty=0.0,  # -2.0 to 2.0, increases likelihood of new topics
        n=1,  # Number of completions to generate
        stream=False,  # Whether to stream the response
        stop=None,  # Stop sequences (list of strings)
        extra_body={
            "metadata": {
                "thinking_config": {
                    "thinking_budget": 1024
                },
                "response_mime_type": "application/json",  # application/json, json_schema
                "responseSchema": {
                    "type": "OBJECT",
                    "properties": {
                        "video_summary": {
                            "type": "STRING",
                            "description": "Summary of the video content from 1 second to 8 seconds."
                        },
                        "image_summary": {
                            "type": "STRING",
                            "description": "Summary of the image content."
                        }
                    },
                    "required": [
                        "video_summary",
                        "image_summary"
                    ]
                }
            }
        }
    )
    return resp

# Usage example
result = call_my_vlm()
print(result)

Response

Returns the chat completion response with structured output.
{
  "id": "resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43",
  "object": "completion",
  "model": "gemini:gemini-2.5-flash",
  "created_at": 1767093024,
  "status": "completed",
  "choices": [
    {
      "text": "This image shows a humorous scene presented from a first-person perspective (FPS).\n\n**Main Scene:**\n*   In the center of the frame, both hands are holding weapons",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 1812,
    "output_tokens": 38,
    "total_tokens": 1850
  },
  "meta": {
    "provider": "gemini",
    "provider_model": "gemini-2.5-flash"
  }
}

Response Parameters

ParameterTypeDescription
idstringUnique identifier for the completion
objectstringObject type, always “completion”
modelstringThe model used for the completion
created_atintegerUnix timestamp of when the completion was created
statusstringStatus of the completion (e.g., “completed”)
choicesarrayArray of completion choices
choices[].textstringText content of the completion
choices[].indexintegerIndex of the choice in the choices array
usageobjectToken usage information
usage.input_tokensintegerNumber of input tokens used
usage.output_tokensintegerNumber of output tokens generated
usage.total_tokensintegerTotal number of tokens used
metaobjectMetadata about the completion
meta.providerstringProvider name (e.g., “gemini”)
meta.provider_modelstringProvider-specific model name

Authorizations

Authorization
string
header
required

Body

application/json
model
string
required

The model to use (e.g., gemini:gemini-2.5-flash)

Example:

"gemini:gemini-2.5-flash"

messages
object[]
required

Array of message objects

temperature
number
default:0.7

Controls randomness: 0.0-2.0, higher = more random

Required range: 0 <= x <= 2
max_tokens
integer
default:1000

Maximum number of tokens to generate

top_p
number
default:1

Nucleus sampling: 0.0-1.0

Required range: 0 <= x <= 1
frequency_penalty
number
default:0

Reduces repetition of frequent tokens: -2.0 to 2.0

Required range: -2 <= x <= 2
presence_penalty
number
default:0

Increases likelihood of new topics: -2.0 to 2.0

Required range: -2 <= x <= 2
n
integer
default:1

Number of completions to generate

stream
boolean
default:false

Whether to stream the response

stop

Stop sequences

extra_body
object

Response

200 - application/json

Chat completion response

id
string
required

Unique identifier for the completion

Example:

"resp_f8d13263-95b3-4337-b4c9-dbe9f6eb1e43"

object
string
required

Object type, always 'completion'

Example:

"completion"

model
string
required

The model used for the completion

Example:

"gemini:gemini-2.5-flash"

created_at
integer
required

Unix timestamp of when the completion was created

Example:

1767093024

status
string
required

Status of the completion

Example:

"completed"

choices
object[]
required
usage
object
required
meta
object
required