Skip to main content

Documentation Index

Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt

Use this file to discover all available pages before exploring further.

Product: Visual Intelligence — Audio File Transcription Use case: Transcribe an uploaded audio/video file to text — async batch or sync, multiple providers (Whisper, ElevenLabs, AssemblyAI) with optional speaker labels. For live streams, see Live Audio Transcription. Host: https://mavi-backend.memories.ai/serve/api/v2 Auth: Authorization: sk-mavi-... (no Bearer prefix)
Segment audio or video by speaker using pyannote. Returns timestamped speaker turns labeled SPEAKER_00, SPEAKER_01, etc. — anonymous labels based on voice characteristics, not identity. Need named speakers? Use Multimodal Speaker Recognition, which combines voice + face recognition to identify speakers by name.
Pricing: $0.001/second of audio or video

Endpoints

MethodEndpointReturns
POST/transcriptions/sync-generate-speakerResult directly
POST/transcriptions/async-generate-speakertask_id + webhook callback
Use sync for short clips. Use async for long files.
The async endpoint requires a configured webhook URL. See Webhooks Settings and the Webhooks Guide.

Request Body

ParameterTypeRequiredDescription
asset_idstringYesThe uploaded audio or video asset ID

Code Examples

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/sync-generate-speaker \
  --header 'Authorization: sk-mavi-...' \
  --header 'Content-Type: application/json' \
  --data '{ "asset_id": "re_657929111888723968" }'

Sync Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "model": "pyannote",
    "items": [
      { "start": 0.03, "end": 0.13, "speaker": "SPEAKER_01" },
      { "start": 0.13, "end": 13.40, "speaker": "SPEAKER_00" }
    ]
  },
  "failed": false,
  "success": true
}

Sync Response Parameters

ParameterTypeDescription
data.modelstringModel used (e.g., pyannote)
data.itemsarraySpeaker segments
data.items[].startnumberSegment start time in seconds
data.items[].endnumberSegment end time in seconds
data.items[].speakerstringAnonymous speaker label (e.g., SPEAKER_00)

Async Response

{
  "code": 200,
  "msg": "success",
  "data": { "task_id": "ec2449885ba84c4f943a80ff0633158e" },
  "failed": false,
  "success": true
}

Callback Response Parameters

ParameterTypeDescription
data.data.dataarraySpeaker segments
data.data.data[].startnumberSegment start time in seconds
data.data.data[].endnumberSegment end time in seconds
data.data.data[].speakerstringAnonymous speaker label
data.data.usage_metadata.durationnumberTotal audio duration in seconds
data.data.usage_metadata.modelstringModel used
task_idstringTask ID matching the initial response