POST
/
transcriptions
/
async-generate-multi-speaker
Async Generate Multi Speaker
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/async-generate-multi-speaker \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "asset_id": "re_657929111888723968"
}
'
{
  "code": "0000",
  "msg": "success",
  "data": {
    "task_id": "ec2449885ba84c4f943a80ff0633158e"
  },
  "failed": false,
  "success": true
}
This endpoint allows you to identify multiple speakers asynchronously.

Code Example

import requests

BASE_URL = "https://mavi-backend.memories.ai/serve/api/v2/transcriptions"
API_KEY = "sk-5f8843b8c0641efd5a3a6478b7679caa"
HEADERS = {
    "Authorization": f"{API_KEY}"
}

def async_generate_multi_speaker(asset_id: str):
    url = f"{BASE_URL}/async-generate-multi-speaker"
    data = {"asset_id": asset_id}
    resp = requests.post(url, json=data, headers=HEADERS)
    return resp.json()

# Usage example
result = async_generate_multi_speaker("re_657929111888723968")
print(result)

Response

Returns the multi-speaker identification task information.
{
  "code": "0000",
  "msg": "success",
  "data": {
    "task_id": "ec2449885ba84c4f943a80ff0633158e"
  },
  "failed": false,
  "success": true
}

Response Parameters

ParameterTypeDescription
codestringResponse code indicating the result status
msgstringResponse message describing the operation result
dataobjectResponse data object containing task information
data.task_idstringUnique identifier of the multi-speaker identification task
successbooleanIndicates whether the operation was successful
failedbooleanIndicates whether the operation failed

Callback Response Parameters

When the multi-speaker identification is complete, a callback will be sent to your configured webhook URL.
ParameterTypeDescription
codestringResponse code (“0000” indicates success)
messagestringStatus message (e.g., “SUCCESS”)
dataobjectResponse data object containing the multimodal ASR result and metadata
data.dataobjectInner data object containing transcription, faces, and usage information
data.data.audio_transcriptionarrayArray of transcription segments with speaker identification
data.data.audio_transcription[].start_timenumberStart time of the segment in seconds
data.data.audio_transcription[].end_timenumberEnd time of the segment in seconds
data.data.audio_transcription[].speakerstringIdentified speaker name
data.data.audio_transcription[].textstringTranscription text for this segment
data.data.facesarrayArray of detected faces with metadata
data.data.faces[].face_idstringUnique identifier for the detected face
data.data.faces[].namestringIdentified name of the person
data.data.faces[].face_file_protocolstringStorage protocol (e.g., “gs” for Google Cloud Storage)
data.data.faces[].face_file_bucketstringStorage bucket name
data.data.faces[].face_file_blobstringFile path in the storage bucket
data.data.usage_metadataarrayArray of usage statistics for different models used
data.data.usage_metadata[].durationnumberProcessing duration in seconds
data.data.usage_metadata[].modelstringThe AI model used (e.g., “gemini-2.5-pro”, “gemini-2.5-flash”)
data.data.usage_metadata[].output_tokensintegerNumber of tokens in the output
data.data.usage_metadata[].prompt_tokensintegerNumber of tokens in the input prompt
data.msgstringDetailed message about the operation result
data.successbooleanIndicates whether the multimodal ASR was successful
task_idstringThe task ID associated with this multi-speaker identification request

Understanding the Callback Response

The callback response has a nested structure with audio transcription, face detection results, and usage information inside data.data. Response Structure:
callback_response
├── code: "0000"
├── message: "SUCCESS"
├── data
│   ├── data
│   │   ├── audio_transcription: [array of transcription segments]
│   │   │   └── [
│   │   │       {
│   │   │         start_time: 0.0,
│   │   │         end_time: 1.8,
│   │   │         speaker: "Kiara S Stepsister",
│   │   │         text: "You wolfless Omega! Clean that up, you jinx!"
│   │   │       },
│   │   │       ...
│   │   │     ]
│   │   ├── faces: [array of detected faces]
│   │   │   └── [
│   │   │       {
│   │   │         face_id: "9e545636-509a-4a7d-b7c8-6359ea6a6d8b_person_001",
│   │   │         name: "Kiara S Stepsister",
│   │   │         face_file_protocol: "gs",
│   │   │         face_file_bucket: "memories-cache",
│   │   │         face_file_blob: "api-backend/.../9e545636_batch_1_video_9_person_001.jpg"
│   │   │       },
│   │   │       ...
│   │   │     ]
│   │   └── usage_metadata: [array of usage stats]
│   │       └── [
│   │           {
│   │             duration: 0.0,
│   │             model: "gemini-2.5-pro",
│   │             output_tokens: 6143,
│   │             prompt_tokens: 442368
│   │           },
│   │           ...
│   │         ]
│   ├── msg: "Multimodal ASR completed successfully"
│   └── success: true
└── task_id: "29799938cfd344db8e10243a266b9990"
How to access the data:
  • Audio transcription segments: callback_response.data.data.audio_transcription
  • First segment speaker: callback_response.data.data.audio_transcription[0].speaker
  • First segment text: callback_response.data.data.audio_transcription[0].text
  • Detected faces: callback_response.data.data.faces
  • First face name: callback_response.data.data.faces[0].name
  • First face image path: callback_response.data.data.faces[0].face_file_blob
  • Usage statistics: callback_response.data.data.usage_metadata
  • Models used: callback_response.data.data.usage_metadata[i].model
  • Success status: callback_response.data.success
  • Task ID: callback_response.task_id

Authorizations

Authorization
string
header
required

Body

application/json
asset_id
string
required

The asset ID to identify multiple speakers for

Example:

"re_657929111888723968"

Response

200 - application/json

Multi-speaker identification task information

code
string

Response code indicating the result status

Example:

"0000"

msg
string

Response message describing the operation result

Example:

"success"

data
object

Response data object containing task information

success
boolean

Indicates whether the operation was successful

Example:

true

failed
boolean

Indicates whether the operation failed

Example:

false