Skip to main content
POST
/
transcriptions
/
async-generate-video
Async Generate Video Transcription
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/async-generate-video \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "asset_id": "re_657929111888723968",
  "model": "gemini-2.5-flash-lite"
}
'
{
  "code": 200,
  "msg": "success",
  "data": {
    "task_id": "ec2449885ba84c4f943a80ff0633158e"
  },
  "failed": false,
  "success": true
}
This endpoint allows you to generate visual descriptions of video content asynchronously. It analyzes the video frames and produces timestamped descriptions of what appears on screen, rather than transcribing spoken audio.
This is an async endpoint. You must configure a webhook URL in Webhooks Settings before calling this endpoint, otherwise you will not receive the processing results. See Webhooks Configuration Guide for details.
Pricing:
  • Input tokens: $0.45/1M tokens
  • Output tokens: $3.75/1M tokens

Supported Models

  • gemini-2.5-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-flash-preview-09-2025
  • gemini-2.5-flash-lite-preview-09-2025

Request Body

ParameterTypeRequiredDescription
asset_idstringYesThe unique identifier of the video asset to generate visual descriptions for
modelstringYesThe model to use for video description (e.g., gemini-2.5-flash-lite)

Code Example

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/async-generate-video \
  --header 'Authorization: sk-mai-this_a_test_string_please_use_your_generated_key_during_testing' \
  --header 'Content-Type: application/json' \
  --data '{
    "asset_id": "re_657929111888723968",
    "model": "gemini-2.5-flash-lite"
  }'

Response

Returns the video description task information.
{
  "code": 200,
  "msg": "success",
  "data": {
    "task_id": "ec2449885ba84c4f943a80ff0633158e"
  },
  "failed": false,
  "success": true
}

Response Parameters

ParameterTypeDescription
codestringResponse code indicating the result status
msgstringResponse message describing the operation result
dataobjectResponse data object containing task information
data.task_idstringUnique identifier of the video description task
successbooleanIndicates whether the operation was successful
failedbooleanIndicates whether the operation failed

Callback Response Parameters

When the video description is complete, a callback will be sent to your configured webhook URL.
ParameterTypeDescription
codestringResponse code (200 indicates success)
messagestringStatus message (e.g., “SUCCESS”)
dataobjectResponse data object containing the description result and metadata
data.dataobjectInner data object containing description segments and usage information
data.data.dataarrayArray of description segments with timestamps
data.data.data[].start_timenumberStart time of the segment in seconds
data.data.data[].end_timenumberEnd time of the segment in seconds
data.data.data[].transcriptstringVisual description text for this time segment
data.data.error_ratenumberError rate of the description (0.0 means no errors)
data.data.usage_metadataobjectUsage statistics for the API call
data.data.usage_metadata.durationnumberProcessing duration in seconds
data.data.usage_metadata.modelstringThe AI model used for description (e.g., “gemini-2.5-flash-lite”)
data.data.usage_metadata.output_tokensintegerNumber of tokens in the generated description
data.data.usage_metadata.prompt_tokensintegerNumber of tokens in the input prompt
data.msgstringDetailed message about the operation result
data.successbooleanIndicates whether the description was successful
task_idstringThe task ID associated with this description request

Understanding the Callback Response

The callback response has a nested structure with the description segments and usage information inside data.data. Response Structure:
callback_response
├── code: 200
├── message: "SUCCESS"
├── data
│   ├── data
│   │   ├── data: [array of description segments]
│   │   │   └── [
│   │   │       {
│   │   │         start_time: 0.0,
│   │   │         end_time: 2.0,
│   │   │         transcript: "..."
│   │   │       },
│   │   │       ...
│   │   │     ]
│   │   ├── error_rate: 0.0
│   │   └── usage_metadata
│   │       ├── duration: 0.0
│   │       ├── model: "gemini-2.5-flash-lite"
│   │       ├── output_tokens: 24641
│   │       └── prompt_tokens: 283901
│   ├── msg: "Video transcription completed successfully"
│   └── success: true
└── task_id: "580e35a50faa437480b2d425bdcf1c87"
How to access the data:
  • Description segments: callback_response.data.data.data
  • First segment text: callback_response.data.data.data[0].transcript
  • Error rate: callback_response.data.data.error_rate
  • Usage statistics: callback_response.data.data.usage_metadata
  • Model used: callback_response.data.data.usage_metadata.model
  • Token counts: callback_response.data.data.usage_metadata.output_tokens and callback_response.data.data.usage_metadata.prompt_tokens
  • Success status: callback_response.data.success
  • Task ID: callback_response.task_id

Authorizations

Authorization
string
header
required

Body

application/json
asset_id
string
required

The video asset ID to transcribe

Example:

"re_657929111888723968"

model
string
required

The transcription model to use

Example:

"gemini-2.5-flash-lite"

Response

200 - application/json

Transcription task information

code
string

Response code indicating the result status

Example:

200

msg
string

Response message describing the operation result

Example:

"success"

data
object

Response data object containing task information

success
boolean

Indicates whether the operation was successful

Example:

true

failed
boolean

Indicates whether the operation failed

Example:

false