AssemblyAI - Memories.ai API Tools

AssemblyAI

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/speech-to-text \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "provider": "<string>",
  "asset_id": "<string>",
  "url": "<string>",
  "source_url": "<string>",
  "language_code": "<string>",
  "punctuate": true,
  "format_text": true,
  "speaker_labels": true,
  "speakers_expected": 123,
  "language_detection": true,
  "language_confidence_threshold": 123,
  "speech_model": "<string>",
  "speech_threshold": 123,
  "disfluencies": true,
  "sentiment_analysis": true,
  "entity_detection": true,
  "auto_highlights": true,
  "content_safety": true,
  "iab_categories": true,
  "auto_chapters": true,
  "summarization": true,
  "summary_model": "<string>",
  "summary_type": "<string>",
  "redact_pii": true,
  "redact_pii_policies": [
    "<string>"
  ],
  "redact_pii_sub": "<string>",
  "redact_pii_audio": true,
  "redact_pii_audio_quality": "<string>",
  "filter_profanity": true,
  "word_boost": [
    "<string>"
  ],
  "boost_param": "<string>",
  "custom_spelling": [
    {}
  ],
  "webhook_url": "<string>",
  "multichannel": true,
  "audio_start_from": 123,
  "audio_end_at": 123,
  "custom_topics": true,
  "topics": [
    "<string>"
  ]
}
'

{
  "code": 200,
  "msg": "success",
  "data": {
    "id": "9a27d0d5-d2db-448c-823c-f098507789be",
    "status": "completed",
    "language_code": "en_us",
    "audio_url": "https://storage.googleapis.com/...",
    "audio_duration": 52,
    "text": "Hello, how are you today? I'm doing well, thank you.",
    "words": [
      {
        "text": "Hello,",
        "start": 0,
        "end": 520,
        "confidence": 0.99,
        "speaker": "A"
      },
      {
        "text": "how",
        "start": 520,
        "end": 780,
        "confidence": 0.98,
        "speaker": "A"
      }
    ],
    "utterances": [
      {
        "confidence": 0.97,
        "start": 0,
        "end": 2980,
        "text": "Hello, how are you today?",
        "speaker": "A"
      },
      {
        "confidence": 0.95,
        "start": 2980,
        "end": 5200,
        "text": "I'm doing well, thank you.",
        "speaker": "B"
      }
    ],
    "confidence": 0.97,
    "punctuate": true,
    "format_text": true,
    "speaker_labels": true,
    "speakers_expected": 2
  },
  "failed": false,
  "success": true
}

POST

serve

api

transcriptions

speech-to-text

AssemblyAI

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/speech-to-text \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "provider": "<string>",
  "asset_id": "<string>",
  "url": "<string>",
  "source_url": "<string>",
  "language_code": "<string>",
  "punctuate": true,
  "format_text": true,
  "speaker_labels": true,
  "speakers_expected": 123,
  "language_detection": true,
  "language_confidence_threshold": 123,
  "speech_model": "<string>",
  "speech_threshold": 123,
  "disfluencies": true,
  "sentiment_analysis": true,
  "entity_detection": true,
  "auto_highlights": true,
  "content_safety": true,
  "iab_categories": true,
  "auto_chapters": true,
  "summarization": true,
  "summary_model": "<string>",
  "summary_type": "<string>",
  "redact_pii": true,
  "redact_pii_policies": [
    "<string>"
  ],
  "redact_pii_sub": "<string>",
  "redact_pii_audio": true,
  "redact_pii_audio_quality": "<string>",
  "filter_profanity": true,
  "word_boost": [
    "<string>"
  ],
  "boost_param": "<string>",
  "custom_spelling": [
    {}
  ],
  "webhook_url": "<string>",
  "multichannel": true,
  "audio_start_from": 123,
  "audio_end_at": 123,
  "custom_topics": true,
  "topics": [
    "<string>"
  ]
}
'

{
  "code": 200,
  "msg": "success",
  "data": {
    "id": "9a27d0d5-d2db-448c-823c-f098507789be",
    "status": "completed",
    "language_code": "en_us",
    "audio_url": "https://storage.googleapis.com/...",
    "audio_duration": 52,
    "text": "Hello, how are you today? I'm doing well, thank you.",
    "words": [
      {
        "text": "Hello,",
        "start": 0,
        "end": 520,
        "confidence": 0.99,
        "speaker": "A"
      },
      {
        "text": "how",
        "start": 520,
        "end": 780,
        "confidence": 0.98,
        "speaker": "A"
      }
    ],
    "utterances": [
      {
        "confidence": 0.97,
        "start": 0,
        "end": 2980,
        "text": "Hello, how are you today?",
        "speaker": "A"
      },
      {
        "confidence": 0.95,
        "start": 2980,
        "end": 5200,
        "text": "I'm doing well, thank you.",
        "speaker": "B"
      }
    ],
    "confidence": 0.97,
    "punctuate": true,
    "format_text": true,
    "speaker_labels": true,
    "speakers_expected": 2
  },
  "failed": false,
  "success": true
}

Product: Visual Intelligence — Audio File Transcription Use case: Transcribe an uploaded audio/video file to text — async batch or sync, multiple providers (Whisper, ElevenLabs, AssemblyAI) with optional speaker labels. For live streams, see Live Audio Transcription. Host: https://mavi-backend.memories.ai/serve/api/v2 Auth: Authorization: sk-mavi-... (no Bearer prefix)

Uses AssemblyAI Universal-2 model. Submits a job and polls until completion, then returns the full result.

Pricing: $0.15/hour of audio, billed by actual audio duration (in seconds).

Audio Source

You must provide one of the following (priority: asset_id > url > source_url).

Parameters

Authorization

string

required

API key for authentication (e.g. sk-mavi-...).

provider

string

required

STT provider. Must be assemblyai.

asset_id

string

The unique identifier of an uploaded audio/video asset (e.g. re_xxx). Resolved to a signed GCS URL.

url

string

A publicly accessible audio URL.

source_url

string

A gs:// GCS path or public HTTP URL. GCS paths are converted to signed URLs automatically.

language_code

string

Language code (ISO 639-1, e.g. en, zh). If omitted, the provider auto-detects the language.

punctuate

boolean

default:"true"

Add punctuation.

format_text

boolean

default:"true"

Format numbers, dates, etc.

speaker_labels

boolean

Enable speaker diarization.

speakers_expected

integer

Expected number of speakers.

language_detection

boolean

Enable automatic language detection.

language_confidence_threshold

number

Confidence threshold for language detection (0.0–1.0).

speech_model

string

Speech recognition model to use.

speech_threshold

number

Speech confidence threshold (0.0–1.0).

disfluencies

boolean

Include disfluencies (um, uh, etc.).

sentiment_analysis

boolean

Enable sentiment analysis per utterance.

entity_detection

boolean

Enable entity detection (names, locations, etc.).

auto_highlights

boolean

Automatically highlight key phrases.

content_safety

boolean

Enable content safety detection.

iab_categories

boolean

Enable IAB topic categorization.

auto_chapters

boolean

Automatically generate chapters.

summarization

boolean

Enable summarization.

summary_model

string

Summarization model: informative or conversational.

summary_type

string

Summary format: bullets, bullets_verbose, headline, paragraph, or gist.

redact_pii

boolean

Enable PII redaction.

redact_pii_policies

string[]

PII types to redact (e.g. email_address, phone_number, person_name).

redact_pii_sub

string

PII replacement strategy: hash or entity_name.

redact_pii_audio

boolean

Redact PII from audio output.

redact_pii_audio_quality

string

Redacted audio quality: mp3 or wav.

filter_profanity

boolean

Filter profanity from transcript.

word_boost

string[]

List of words to boost recognition.

boost_param

string

Boost strength: low, default, or high.

custom_spelling

object[]

Custom spelling corrections.

webhook_url

string

AssemblyAI webhook callback URL.

multichannel

boolean

Enable multi-channel transcription.

audio_start_from

integer

Start transcription from this time (milliseconds).

audio_end_at

integer

End transcription at this time (milliseconds).

custom_topics

boolean

Enable custom topic detection.

topics

string[]

Custom topic labels.

Code Examples

curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/transcriptions/speech-to-text \
  --header 'Authorization: sk-mavi-...' \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "assemblyai",
    "asset_id": "re_657929111888723968",
    "language_code": "en",
    "speaker_labels": true,
    "speakers_expected": 2,
    "punctuate": true,
    "format_text": true
  }'

Response

{
  "code": 200,
  "msg": "success",
  "data": {
    "id": "9a27d0d5-d2db-448c-823c-f098507789be",
    "status": "completed",
    "language_code": "en_us",
    "audio_url": "https://storage.googleapis.com/...",
    "audio_duration": 52,
    "text": "Hello, how are you today? I'm doing well, thank you.",
    "words": [
      {
        "text": "Hello,",
        "start": 0,
        "end": 520,
        "confidence": 0.99,
        "speaker": "A"
      },
      {
        "text": "how",
        "start": 520,
        "end": 780,
        "confidence": 0.98,
        "speaker": "A"
      }
    ],
    "utterances": [
      {
        "confidence": 0.97,
        "start": 0,
        "end": 2980,
        "text": "Hello, how are you today?",
        "speaker": "A"
      },
      {
        "confidence": 0.95,
        "start": 2980,
        "end": 5200,
        "text": "I'm doing well, thank you.",
        "speaker": "B"
      }
    ],
    "confidence": 0.97,
    "punctuate": true,
    "format_text": true,
    "speaker_labels": true,
    "speakers_expected": 2
  },
  "failed": false,
  "success": true
}

Response Parameters

Parameter	Type	Description
data.id	string	AssemblyAI transcript ID
data.status	string	Transcript status: `completed`
data.language_code	string	Detected language code
data.audio_duration	integer	Audio duration in seconds
data.text	string	Full transcription text
data.confidence	number	Overall transcription confidence (0.0–1.0)
data.words	array[object]	Word-level transcription with timing (milliseconds)
data.words[].text	string	The transcribed word
data.words[].start	integer	Start time in milliseconds
data.words[].end	integer	End time in milliseconds
data.words[].confidence	number	Word confidence score
data.words[].speaker	string	Speaker label (e.g. `A`, `B`). Only present when `speaker_labels=true`.
data.utterances	array[object]	Sentence-level segments (only when `speaker_labels=true`)
data.utterances[].text	string	Utterance text
data.utterances[].start	integer	Start time in milliseconds
data.utterances[].end	integer	End time in milliseconds
data.utterances[].confidence	number	Utterance confidence score
data.utterances[].speaker	string	Speaker label

Timestamps are in milliseconds (e.g. 520).

ElevenLabs Whisper Audio Transcription

​Audio Source

​Parameters

​Code Examples

​Response

​Response Parameters

Audio Source

Parameters

Code Examples

Response

Response Parameters