YouTube Video MAI Transcript

This endpoint allows you to retrieve transcript for a YouTube video using MAI method.

Code Example

import requests

BASE_URL = "https://mavi-backend.memories.ai/serve/api/v2"
API_KEY = "sk-8483027fe3abfe535f6ae01a9979b4f7"
HEADERS = {
    "Authorization": f"{API_KEY}"
}

def youtube_video_mai_transcript(video_url: str):
    url = f"{BASE_URL}/youtube/video/mai/transcript"
    data = {"video_url": video_url}
    resp = requests.post(url, headers=HEADERS, json=data)
    return resp.json()

# Usage example
result = youtube_video_mai_transcript("https://www.youtube.com/shorts/m8sOA8MxmQE")
print(result)

Response

Returns the transcription task information.

{
  "code": "0000",
  "msg": "success",
  "data": {
    "task_id": "1cd78354af824c8eb1dafe4ed2435720"
  },
  "failed": false,
  "success": true
}

Response Parameters

Parameter	Type	Description
code	string	Response code indicating the result status
msg	string	Response message describing the operation result
data	object	Response data object containing task information
data.task_id	string	Unique identifier of the transcription task
success	boolean	Indicates whether the operation was successful
failed	boolean	Indicates whether the operation failed

Callback Response Parameters

When the YouTube video transcription is complete, a callback will be sent to your configured webhook URL.

Parameter	Type	Description
code	string	Response code (“0000” indicates success)
message	string	Status message (e.g., “SUCCESS”)
data	object	Response data object containing both video and audio transcription results
data.videoTranscript	object	Video transcription result object
data.videoTranscript.data	object	Inner data object containing video transcription segments and usage information
data.videoTranscript.data.data	array	Array of video transcription segments with timestamps
data.videoTranscript.data.data[].start_time	number	Start time of the video segment in seconds
data.videoTranscript.data.data[].end_time	number	End time of the video segment in seconds
data.videoTranscript.data.data[].transcript	string	Video transcription text describing the visual content
data.videoTranscript.data.error_rate	number	Error rate of the video transcription (0.0 means no errors)
data.videoTranscript.data.usage_metadata	object	Usage statistics for the video transcription
data.videoTranscript.data.usage_metadata.duration	number	Processing duration in seconds
data.videoTranscript.data.usage_metadata.model	string	The AI model used for video transcription (e.g., “openai/gpt-5-mini”)
data.videoTranscript.data.usage_metadata.output_tokens	integer	Number of tokens in the generated video transcription
data.videoTranscript.data.usage_metadata.prompt_tokens	integer	Number of tokens in the input prompt
data.videoTranscript.msg	string	Detailed message about the video transcription result
data.videoTranscript.success	boolean	Indicates whether the video transcription was successful
data.audioTranscript	object	Audio transcription result object
data.audioTranscript.data	object	Inner data object containing audio transcription segments and usage information
data.audioTranscript.data.data	array	Array of audio transcription segments with timestamps
data.audioTranscript.data.data[].start_time	number	Start time of the audio segment in seconds
data.audioTranscript.data.data[].end_time	number	End time of the audio segment in seconds
data.audioTranscript.data.data[].text	string	Audio transcription text for this segment
data.audioTranscript.data.data[].speaker	string \| null	Speaker identifier (null if speaker identification not enabled)
data.audioTranscript.data.usage_metadata	object	Usage statistics for the audio transcription
data.audioTranscript.data.usage_metadata.duration	number	Audio duration in seconds
data.audioTranscript.data.usage_metadata.model	string	The model used for audio transcription (e.g., “whisper-1”)
data.audioTranscript.data.usage_metadata.output_tokens	integer	Number of output tokens (0 for audio transcription)
data.audioTranscript.data.usage_metadata.prompt_tokens	integer	Number of prompt tokens (0 for audio transcription)
data.audioTranscript.msg	string	Detailed message about the audio transcription result
data.audioTranscript.success	boolean	Indicates whether the audio transcription was successful
task_id	string	The task ID associated with this transcription request

Understanding the Callback Response

The callback response has a nested structure with both video and audio transcription results inside data. Response Structure:

callback_response
├── code: "0000"
├── message: "SUCCESS"
├── data
│   ├── videoTranscript
│   │   ├── data
│   │   │   ├── data: [array of video transcription segments]
│   │   │   │   └── [
│   │   │   │       {
│   │   │   │         start_time: 0.0,
│   │   │   │         end_time: 2.0,
│   │   │   │         transcript: "..."
│   │   │   │       },
│   │   │   │       ...
│   │   │   │     ]
│   │   │   ├── error_rate: 0.0
│   │   │   └── usage_metadata
│   │   │       ├── duration: 0.0
│   │   │       ├── model: "openai/gpt-5-mini"
│   │   │       ├── output_tokens: 4134
│   │   │       └── prompt_tokens: 42951
│   │   ├── msg: "Video transcription completed successfully"
│   │   └── success: true
│   └── audioTranscript
│       ├── data
│       │   ├── data: [array of audio transcription segments]
│       │   │   └── [
│       │   │       {
│       │   │         start_time: 0.0,
│       │   │         end_time: 2.0,
│       │   │         text: " Oh",
│       │   │         speaker: null
│       │   │       },
│       │   │       ...
│       │   │     ]
│       │   └── usage_metadata
│       │       ├── duration: 2
│       │       ├── model: "whisper-1"
│       │       ├── output_tokens: 0
│       │       └── prompt_tokens: 0
│       ├── msg: "ASR transcription completed successfully"
│       └── success: true
└── task_id: "4330ce434e744cdb8e325c96a20e1460"

How to access the data:

Video transcription segments: callback_response.data.videoTranscript.data.data
First video segment text: callback_response.data.videoTranscript.data.data[0].transcript
Video error rate: callback_response.data.videoTranscript.data.error_rate
Video usage statistics: callback_response.data.videoTranscript.data.usage_metadata
Video model used: callback_response.data.videoTranscript.data.usage_metadata.model
Audio transcription segments: callback_response.data.audioTranscript.data.data
First audio segment text: callback_response.data.audioTranscript.data.data[0].text
First audio segment speaker: callback_response.data.audioTranscript.data.data[0].speaker
Audio usage statistics: callback_response.data.audioTranscript.data.usage_metadata
Audio model used: callback_response.data.audioTranscript.data.usage_metadata.model
Success status: callback_response.data.videoTranscript.success and callback_response.data.audioTranscript.success
Task ID: callback_response.task_id

Authorizations

Authorization

string

header

required

Body

application/json

video_url

string

required

The YouTube video URL

Example:

"https://www.youtube.com/shorts/m8sOA8MxmQE"

Response

200 - application/json

Transcription task information

code

string

Response code indicating the result status

Example:

"0000"

msg

string

Response message describing the operation result

Example:

"success"

data

object

Response data object containing task information

Show child attributes

success

boolean

Indicates whether the operation was successful

Example:

true

failed

boolean

Indicates whether the operation failed

Example:

false

Getting Started

Base

Transcript

Video Metadata & Transcript

VLM

Embeddings

YouTube Video MAI Transcript

Code Example

Response

Response Parameters

Callback Response Parameters

Understanding the Callback Response

Authorizations

Body

Response

Getting Started

Base

Transcript

Video Metadata & Transcript

VLM

Embeddings

​Code Example

​Response

​Response Parameters

​Callback Response Parameters

​Understanding the Callback Response

Authorizations

Body

Response

Code Example

Response

Response Parameters

Callback Response Parameters

Understanding the Callback Response