Generate multi-speaker identification asynchronously.
| Parameter | Type | Description |
|---|---|---|
| code | string | Response code indicating the result status |
| msg | string | Response message describing the operation result |
| data | object | Response data object containing task information |
| data.task_id | string | Unique identifier of the multi-speaker identification task |
| success | boolean | Indicates whether the operation was successful |
| failed | boolean | Indicates whether the operation failed |
| Parameter | Type | Description |
|---|---|---|
| code | string | Response code (“0000” indicates success) |
| message | string | Status message (e.g., “SUCCESS”) |
| data | object | Response data object containing the multimodal ASR result and metadata |
| data.data | object | Inner data object containing transcription, faces, and usage information |
| data.data.audio_transcription | array | Array of transcription segments with speaker identification |
| data.data.audio_transcription[].start_time | number | Start time of the segment in seconds |
| data.data.audio_transcription[].end_time | number | End time of the segment in seconds |
| data.data.audio_transcription[].speaker | string | Identified speaker name |
| data.data.audio_transcription[].text | string | Transcription text for this segment |
| data.data.faces | array | Array of detected faces with metadata |
| data.data.faces[].face_id | string | Unique identifier for the detected face |
| data.data.faces[].name | string | Identified name of the person |
| data.data.faces[].face_file_protocol | string | Storage protocol (e.g., “gs” for Google Cloud Storage) |
| data.data.faces[].face_file_bucket | string | Storage bucket name |
| data.data.faces[].face_file_blob | string | File path in the storage bucket |
| data.data.usage_metadata | array | Array of usage statistics for different models used |
| data.data.usage_metadata[].duration | number | Processing duration in seconds |
| data.data.usage_metadata[].model | string | The AI model used (e.g., “gemini-2.5-pro”, “gemini-2.5-flash”) |
| data.data.usage_metadata[].output_tokens | integer | Number of tokens in the output |
| data.data.usage_metadata[].prompt_tokens | integer | Number of tokens in the input prompt |
| data.msg | string | Detailed message about the operation result |
| data.success | boolean | Indicates whether the multimodal ASR was successful |
| task_id | string | The task ID associated with this multi-speaker identification request |
data.data.
Response Structure:
callback_response.data.data.audio_transcriptioncallback_response.data.data.audio_transcription[0].speakercallback_response.data.data.audio_transcription[0].textcallback_response.data.data.facescallback_response.data.data.faces[0].namecallback_response.data.data.faces[0].face_file_blobcallback_response.data.data.usage_metadatacallback_response.data.data.usage_metadata[i].modelcallback_response.data.successcallback_response.task_idThe asset ID to identify multiple speakers for
"re_657929111888723968"
Multi-speaker identification task information
Response code indicating the result status
"0000"
Response message describing the operation result
"success"
Response data object containing task information
Indicates whether the operation was successful
true
Indicates whether the operation failed
false