ElevenLabs

ElevenLabs

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/speech-to-text \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "provider": "<string>",
  "asset_id": "<string>",
  "url": "<string>",
  "source_url": "<string>",
  "language_code": "<string>",
  "model_id": "<string>",
  "diarize": true,
  "timestamps_granularity": "<string>",
  "tag_audio_events": true,
  "num_speakers": 123,
  "file_format": "<string>",
  "source_lang": "<string>",
  "target_lang": "<string>"
}
'

{
  "code": 200,
  "msg": "success",
  "data": {
    "language_code": "en",
    "language_probability": 0.98,
    "text": "Hello, how are you today? I'm doing well, thank you.",
    "words": [
      {
        "text": "Hello,",
        "start": 0.0,
        "end": 0.52,
        "type": "word",
        "speaker_id": "speaker_0"
      },
      {
        "text": " ",
        "start": 0.52,
        "end": 0.52,
        "type": "spacing"
      },
      {
        "text": "how",
        "start": 0.52,
        "end": 0.78,
        "type": "word",
        "speaker_id": "speaker_0"
      }
    ]
  },
  "failed": false,
  "success": true
}

POST

serve

api

transcriptions

speech-to-text

ElevenLabs

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/speech-to-text \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "provider": "<string>",
  "asset_id": "<string>",
  "url": "<string>",
  "source_url": "<string>",
  "language_code": "<string>",
  "model_id": "<string>",
  "diarize": true,
  "timestamps_granularity": "<string>",
  "tag_audio_events": true,
  "num_speakers": 123,
  "file_format": "<string>",
  "source_lang": "<string>",
  "target_lang": "<string>"
}
'

{
  "code": 200,
  "msg": "success",
  "data": {
    "language_code": "en",
    "language_probability": 0.98,
    "text": "Hello, how are you today? I'm doing well, thank you.",
    "words": [
      {
        "text": "Hello,",
        "start": 0.0,
        "end": 0.52,
        "type": "word",
        "speaker_id": "speaker_0"
      },
      {
        "text": " ",
        "start": 0.52,
        "end": 0.52,
        "type": "spacing"
      },
      {
        "text": "how",
        "start": 0.52,
        "end": 0.78,
        "type": "word",
        "speaker_id": "speaker_0"
      }
    ]
  },
  "failed": false,
  "success": true
}

Uses ElevenLabs Scribe V2 model. Returns results synchronously.

Pricing: $0.39/hour of audio, billed by actual audio duration (in seconds).

Audio Source

You must provide one of the following (priority: asset_id > url > source_url).

Parameters

Authorization

string

required

API key for authentication (e.g. sk-mai-xxx).

provider

string

default:"elevenlabs"

STT provider. Use elevenlabs for this endpoint.

asset_id

string

The unique identifier of an uploaded audio/video asset (e.g. re_xxx). Resolved to a signed GCS URL.

url

string

A publicly accessible audio URL.

source_url

string

A gs:// GCS path or public HTTP URL. GCS paths are converted to signed URLs automatically.

language_code

string

Language code (ISO 639-1, e.g. en, zh). If omitted, the provider auto-detects the language.

model_id

string

default:"scribe_v2"

Model to use.

diarize

boolean

Enable speaker diarization.

timestamps_granularity

string

Timestamp level: none, segment, or word.

tag_audio_events

boolean

Tag audio events such as music, laughter, applause.

num_speakers

integer

Expected number of speakers (improves diarization).

file_format

string

Audio format hint (e.g. pcm_s16le_16000).

source_lang

string

Source language for translation.

target_lang

string

Target language for translation.

Code Examples

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/speech-to-text \
  --header 'Authorization: sk-mai-this_a_test_string_please_use_your_generated_key_during_testing' \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "elevenlabs",
    "asset_id": "re_657929111888723968",
    "model_id": "scribe_v2",
    "language_code": "en",
    "diarize": true,
    "timestamps_granularity": "word",
    "num_speakers": 2
  }'

Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "language_code": "en",
    "language_probability": 0.98,
    "text": "Hello, how are you today? I'm doing well, thank you.",
    "words": [
      {
        "text": "Hello,",
        "start": 0.0,
        "end": 0.52,
        "type": "word",
        "speaker_id": "speaker_0"
      },
      {
        "text": " ",
        "start": 0.52,
        "end": 0.52,
        "type": "spacing"
      },
      {
        "text": "how",
        "start": 0.52,
        "end": 0.78,
        "type": "word",
        "speaker_id": "speaker_0"
      }
    ]
  },
  "failed": false,
  "success": true
}

Response Parameters

Parameter	Type	Description
data.language_code	string	Detected language code (ISO 639-1)
data.language_probability	number	Confidence of language detection (0.0–1.0)
data.text	string	Full transcription text
data.words	array[object]	Word-level transcription with timing
data.words[].text	string	The word or spacing text
data.words[].start	number	Start time in seconds
data.words[].end	number	End time in seconds
data.words[].type	string	Token type: `word`, `spacing`, or `audio_event`
data.words[].speaker_id	string	Speaker identifier (e.g. `speaker_0`). Only present when `diarize=true`.

Timestamps are in seconds (e.g. 0.52).

Delete Asset AssemblyAI

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Audio Source

Parameters

Code Examples

Response

Response Parameters

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

​Audio Source

​Parameters

​Code Examples

​Response

​Response Parameters

Audio Source

Parameters

Code Examples

Response

Response Parameters