Video Frame Description

Product: Visual Intelligence — Video Task APIs (application layer) Use case: Pre-packaged video analysis tasks built on top of the Video Model APIs — fixed prompt + workflow, you just POST a video and get a structured result. For full control over prompt and model selection, see Video Model APIs. Host: https://mavi-backend.memories.ai/serve/api/v2 Auth: Authorization: sk-mavi-... (no Bearer prefix)

Analyzes video frames using a Gemini model and produces timestamped descriptions of what is visually happening — scene changes, on-screen actions, objects — rather than transcribing spoken words. Not the same as Video Caption: Video Caption runs on a dedicated security.memories.ai service, supports Human ReID (named person identification), and requires a separate API key. Use Video Frame Description for standard frame-by-frame analysis on assets you’ve uploaded via the main API.

This is an async endpoint. You must configure a webhook URL in Webhooks Settings before calling this endpoint, otherwise you will not receive the processing results. See Webhooks Configuration Guide for details.

Pricing:

Input tokens: $0.45/1M tokens
Output tokens: $3.75/1M tokens

Supported Models

gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.5-flash-preview-09-2025
gemini-2.5-flash-lite-preview-09-2025

Request Body

Parameter	Type	Required	Description
asset_id	string	Yes	The unique identifier of the video asset to generate visual descriptions for
model	string	Yes	The model to use for video description (e.g., `gemini-2.5-flash-lite`)

Code Example

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/async-generate-video \
  --header 'Authorization: sk-mavi-...' \
  --header 'Content-Type: application/json' \
  --data '{
    "asset_id": "re_657929111888723968",
    "model": "gemini-2.5-flash-lite"
  }'

Response

Returns the video description task information.

{
  "code": 200,
  "msg": "success",
  "data": {
    "task_id": "ec2449885ba84c4f943a80ff0633158e"
  },
  "failed": false,
  "success": true
}

Response Parameters

Parameter	Type	Description
code	string	Response code indicating the result status
msg	string	Response message describing the operation result
data	object	Response data object containing task information
data.task_id	string	Unique identifier of the video description task
success	boolean	Indicates whether the operation was successful
failed	boolean	Indicates whether the operation failed

Callback Response Parameters

When the video description is complete, a callback will be sent to your configured webhook URL.

Parameter	Type	Description
code	string	Response code (200 indicates success)
message	string	Status message (e.g., “SUCCESS”)
data	object	Response data object containing the description result and metadata
data.data	object	Inner data object containing description segments and usage information
data.data.data	array	Array of description segments with timestamps
data.data.data[].start_time	number	Start time of the segment in seconds
data.data.data[].end_time	number	End time of the segment in seconds
data.data.data[].transcript	string	Visual description text for this time segment
data.data.error_rate	number	Error rate of the description (0.0 means no errors)
data.data.usage_metadata	object	Usage statistics for the API call
data.data.usage_metadata.duration	number	Processing duration in seconds
data.data.usage_metadata.model	string	The AI model used for description (e.g., “gemini-2.5-flash-lite”)
data.data.usage_metadata.output_tokens	integer	Number of tokens in the generated description
data.data.usage_metadata.prompt_tokens	integer	Number of tokens in the input prompt
data.msg	string	Detailed message about the operation result
data.success	boolean	Indicates whether the description was successful
task_id	string	The task ID associated with this description request

Authorizations

Authorization

string

header

required

Body

application/json

asset_id

string

required

The video asset ID to transcribe

Example:

"re_657929111888723968"

model

string

required

The transcription model to use

Example:

"gemini-2.5-flash-lite"

Response

200 - application/json

Transcription task information

code

string

Response code indicating the result status

Example:

200

msg

string

Response message describing the operation result

Example:

"success"

data

object

Response data object containing task information

Show child attributes

success

boolean

Indicates whether the operation was successful

Example:

true

failed

boolean

Indicates whether the operation failed

Example:

false

Get Started

Asset Management

Social Media Scraping

Audio File Transcription

Live Audio Transcription

Video Model APIs

Video Task APIs

Live Video Content Moderation

Live Video Understanding

Image Model APIs

Embeddings

Human ReID & Caption

Reference

Supported Models

Request Body

Code Example

Response

Response Parameters

Callback Response Parameters

Authorizations

Body

Response

Get Started

Asset Management

Social Media Scraping

Audio File Transcription

Live Audio Transcription

Video Model APIs

Video Task APIs

Live Video Content Moderation

Live Video Understanding

Image Model APIs

Embeddings

Human ReID & Caption

Reference

Documentation Index

​Supported Models

​Request Body

​Code Example

​Response

​Response Parameters

​Callback Response Parameters

Authorizations

Body

Response

Supported Models

Request Body

Code Example

Response

Response Parameters

Callback Response Parameters