POST
/
instagram
/
video
/
mai
/
transcript
Instagram Video MAI Transcript
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/instagram/video/mai/transcript \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "video_url": "https://www.instagram.com/reels/DLlGZiCOBQ0/"
}
'
{
  "code": "0000",
  "msg": "success",
  "data": {
    "task_id": "1cd78354af824c8eb1dafe4ed2435720"
  },
  "failed": false,
  "success": true
}
This endpoint allows you to retrieve transcript for an Instagram video using MAI method.

Code Example

import requests

BASE_URL = "https://mavi-backend.memories.ai/serve/api/v2"
API_KEY = "sk-8483027fe3abfe535f6ae01a9979b4f7"
HEADERS = {
    "Authorization": f"{API_KEY}"
}

def instagram_video_mai_transcript(video_url: str):
    url = f"{BASE_URL}/instagram/video/mai/transcript"
    data = {"video_url": video_url}
    resp = requests.post(url, headers=HEADERS, json=data)
    return resp.json()

# Usage example
result = instagram_video_mai_transcript("https://www.instagram.com/reels/DLlGZiCOBQ0/")
print(result)

Response

Returns the transcription task information.
{
  "code": "0000",
  "msg": "success",
  "data": {
    "task_id": "1cd78354af824c8eb1dafe4ed2435720"
  },
  "failed": false,
  "success": true
}

Response Parameters

ParameterTypeDescription
codestringResponse code indicating the result status
msgstringResponse message describing the operation result
dataobjectResponse data object containing task information
data.task_idstringUnique identifier of the transcription task
successbooleanIndicates whether the operation was successful
failedbooleanIndicates whether the operation failed

Callback Response Parameters

When the Instagram video transcription is complete, a callback will be sent to your configured webhook URL.
ParameterTypeDescription
codestringResponse code (“0000” indicates success)
messagestringStatus message (e.g., “SUCCESS”)
dataobjectResponse data object containing both video and audio transcription results
data.videoTranscriptobjectVideo transcription result object
data.videoTranscript.dataobjectInner data object containing video transcription segments and usage information
data.videoTranscript.data.dataarrayArray of video transcription segments with timestamps
data.videoTranscript.data.data[].start_timenumberStart time of the video segment in seconds
data.videoTranscript.data.data[].end_timenumberEnd time of the video segment in seconds
data.videoTranscript.data.data[].transcriptstringVideo transcription text describing the visual content
data.videoTranscript.data.error_ratenumberError rate of the video transcription (0.0 means no errors)
data.videoTranscript.data.usage_metadataobjectUsage statistics for the video transcription
data.videoTranscript.data.usage_metadata.durationnumberProcessing duration in seconds
data.videoTranscript.data.usage_metadata.modelstringThe AI model used for video transcription (e.g., “gemini-2.5-flash”)
data.videoTranscript.data.usage_metadata.output_tokensintegerNumber of tokens in the generated video transcription
data.videoTranscript.data.usage_metadata.prompt_tokensintegerNumber of tokens in the input prompt
data.videoTranscript.msgstringDetailed message about the video transcription result
data.videoTranscript.successbooleanIndicates whether the video transcription was successful
data.audioTranscriptobjectAudio transcription result object
data.audioTranscript.dataobjectInner data object containing audio transcription segments and usage information
data.audioTranscript.data.dataarrayArray of audio transcription segments with timestamps
data.audioTranscript.data.data[].start_timenumberStart time of the audio segment in seconds
data.audioTranscript.data.data[].end_timenumberEnd time of the audio segment in seconds
data.audioTranscript.data.data[].textstringAudio transcription text for this segment
data.audioTranscript.data.data[].speakerstring | nullSpeaker identifier (null if speaker identification not enabled)
data.audioTranscript.data.usage_metadataobjectUsage statistics for the audio transcription
data.audioTranscript.data.usage_metadata.durationnumberAudio duration in seconds
data.audioTranscript.data.usage_metadata.modelstringThe model used for audio transcription (e.g., “whisper-1”)
data.audioTranscript.data.usage_metadata.output_tokensintegerNumber of output tokens (0 for audio transcription)
data.audioTranscript.data.usage_metadata.prompt_tokensintegerNumber of prompt tokens (0 for audio transcription)
data.audioTranscript.msgstringDetailed message about the audio transcription result
data.audioTranscript.successbooleanIndicates whether the audio transcription was successful
task_idstringThe task ID associated with this transcription request

Understanding the Callback Response

The callback response has a nested structure with both video and audio transcription results inside data. Response Structure:
callback_response
├── code: "0000"
├── message: "SUCCESS"
├── data
│   ├── videoTranscript
│   │   ├── data
│   │   │   ├── data: [array of video transcription segments]
│   │   │   │   └── [
│   │   │   │       {
│   │   │   │         start_time: 0.0,
│   │   │   │         end_time: 9.0,
│   │   │         transcript: "..."
│   │   │   │       },
│   │   │   │       ...
│   │   │   │     ]
│   │   │   ├── error_rate: 0.0
│   │   │   └── usage_metadata
│   │   │       ├── duration: 0.0
│   │   │       ├── model: "gemini-2.5-flash"
│   │   │       ├── output_tokens: 813
│   │   │       └── prompt_tokens: 15160
│   │   ├── msg: "Video transcription completed successfully"
│   │   └── success: true
│   └── audioTranscript
│       ├── data
│       │   ├── data: [array of audio transcription segments]
│       │   │   └── [
│       │   │       {
│       │   │         start_time: 0.0,
│       │   │         end_time: 2.44,
│       │   │         text: " I'm going to get a little personal.",
│       │   │         speaker: null
│       │   │       },
│       │   │       ...
│       │   │     ]
│       │   └── usage_metadata
│       │       ├── duration: 41.235782
│       │       ├── model: "whisper-1"
│       │       ├── output_tokens: 0
│       │       └── prompt_tokens: 0
│       ├── msg: "ASR transcription completed successfully"
│       └── success: true
└── task_id: "8b4e80ea9774438c83b681dc427a310c"
How to access the data:
  • Video transcription segments: callback_response.data.videoTranscript.data.data
  • First video segment text: callback_response.data.videoTranscript.data.data[0].transcript
  • Video error rate: callback_response.data.videoTranscript.data.error_rate
  • Video usage statistics: callback_response.data.videoTranscript.data.usage_metadata
  • Video model used: callback_response.data.videoTranscript.data.usage_metadata.model
  • Audio transcription segments: callback_response.data.audioTranscript.data.data
  • First audio segment text: callback_response.data.audioTranscript.data.data[0].text
  • First audio segment speaker: callback_response.data.audioTranscript.data.data[0].speaker
  • Audio usage statistics: callback_response.data.audioTranscript.data.usage_metadata
  • Audio model used: callback_response.data.audioTranscript.data.usage_metadata.model
  • Success status: callback_response.data.videoTranscript.success and callback_response.data.audioTranscript.success
  • Task ID: callback_response.task_id

Authorizations

Authorization
string
header
required

Body

application/json
video_url
string
required

The Instagram video URL

Example:

"https://www.instagram.com/reels/DLlGZiCOBQ0/"

Response

200 - application/json

Transcription task information

code
string

Response code indicating the result status

Example:

"0000"

msg
string

Response message describing the operation result

Example:

"success"

data
object

Response data object containing task information

success
boolean

Indicates whether the operation was successful

Example:

true

failed
boolean

Indicates whether the operation failed

Example:

false