Skip to main content

Documentation Index

Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt

Use this file to discover all available pages before exploring further.

Product: Visual Intelligence — Audio File Transcription Use case: Transcribe an uploaded audio/video file to text — async batch or sync, multiple providers (Whisper, ElevenLabs, AssemblyAI) with optional speaker labels. For live streams, see Live Audio Transcription. Host: https://mavi-backend.memories.ai/serve/api/v2 Auth: Authorization: sk-mavi-... (no Bearer prefix)
Transcribe speech from audio or video files using OpenAI Whisper. Returns timestamped text segments. Add speaker: true to label each segment by speaker (doubles the price). Use this endpoint when you need fast, cost-effective speech-to-text on your own uploaded assets. For third-party providers with richer features (word-level confidence, entity detection, PII redaction), see ElevenLabs or AssemblyAI.
Pricing:
  • $0.001/second (without speaker labeling)
  • $0.002/second (with speaker: true)

Endpoints

MethodEndpointReturns
POST/transcriptions/sync-generate-audioResult directly
POST/transcriptions/async-generate-audiotask_id + webhook callback
Use sync for short clips where you want an immediate result. Use async for long files — you’ll receive the result via webhook when processing completes.
The async endpoint requires a configured webhook URL. See Webhooks Settings and the Webhooks Guide.Without a configured webhook the async endpoint rejects requests with:
HTTP 400 {"code": 400, "msg": "An async request requires at least one webhook.", "data": null}

Error Responses

Verified live against the sync and async endpoints:
// Sync — missing asset_id
HTTP 400 {"code": 400, "msg": "asset_id cannot be null or empty", "data": null}

// Sync / async — unknown asset_id (and many other validation failures)
HTTP 400 {"code": 400, "msg": "Request has exceeded the limit.", "data": null}
The string "Request has exceeded the limit." is shared across multiple failure paths on this endpoint — true rate-limit rejections AND validation failures (unknown asset_id, wrong model, etc.). Branch on HTTP 400 only, don’t try to parse msg to discriminate.

Supported Models

  • whisper-1

Request Body

ParameterTypeRequiredDescription
asset_idstringYesThe uploaded audio or video asset ID to transcribe
modelstringNoTranscription model (default: whisper-1)
speakerbooleanNoEnable speaker labeling. Each segment will include speaker (e.g., SPEAKER_00). Doubles the price (default: false).

Code Examples

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/sync-generate-audio \
  --header 'Authorization: sk-mavi-...' \
  --header 'Content-Type: application/json' \
  --data '{
    "asset_id": "re_657929111888723968",
    "model": "whisper-1",
    "speaker": false
  }'

Sync Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "model": "whisper-1",
    "items": [
      { "text": "Hello, how are you today?", "start_time": 0.0, "end_time": 2.98 },
      { "text": "I'm doing well, thank you.", "start_time": 2.98, "end_time": 6.78 }
    ]
  },
  "failed": false,
  "success": true
}

Sync Response Parameters

ParameterTypeDescription
data.modelstringModel used (e.g., whisper-1)
data.itemsarrayTranscription segments
data.items[].textstringTranscribed text for this segment
data.items[].start_timenumberSegment start time in seconds
data.items[].end_timenumberSegment end time in seconds
data.items[].speakerstringSpeaker label (e.g., SPEAKER_00). Only present when speaker=true.

Async Response

The initial response returns a task_id. Results are delivered to your webhook URL when transcription completes.
{
  "code": 200,
  "msg": "success",
  "data": { "task_id": "ec2449885ba84c4f943a80ff0633158e" },
  "failed": false,
  "success": true
}

Callback Response Parameters

ParameterTypeDescription
data.data.dataarrayTranscription segments
data.data.data[].start_timenumberSegment start time in seconds
data.data.data[].end_timenumberSegment end time in seconds
data.data.data[].textstringTranscribed text
data.data.data[].speakerstring | nullSpeaker label, or null if speaker=false
data.data.usage_metadata.durationnumberAudio duration in seconds
data.data.usage_metadata.modelstringModel used
task_idstringTask ID matching the initial response