Skip to main content
POST
/
audio-stream
/
start
Start Audio Stream Transcription
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/audio-stream/start \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "audio_url": "rtmp://example.com/live/audio",
  "language_code": "en",
  "punctuate": false,
  "format_text": false,
  "language_detection": false,
  "language_confidence_threshold": 0,
  "audio_start_from": 0,
  "audio_end_at": 0,
  "multichannel": false,
  "speech_models": [
    "<string>"
  ],
  "speech_threshold": 0,
  "disfluencies": false,
  "speaker_labels": false,
  "speakers_expected": 0,
  "sentiment_analysis": false,
  "entity_detection": false,
  "auto_highlights": false,
  "content_safety": false,
  "iab_categories": false,
  "auto_chapters": false,
  "summarization": false,
  "summary_model": "<string>",
  "summary_type": "<string>",
  "custom_topics": false,
  "topics": [
    "<string>"
  ],
  "redact_pii": false,
  "redact_pii_sub": "<string>",
  "redact_pii_policies": [
    "<string>"
  ],
  "redact_pii_audio": false,
  "redact_pii_audio_quality": "<string>",
  "filter_profanity": false,
  "custom_spelling": [
    {}
  ],
  "speech_understanding": {}
}
'
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "660e8400-e29b-41d4-a716-446655440001",
    "message": "Audio stream transcription started"
  }
}

Documentation Index

Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt

Use this file to discover all available pages before exploring further.

Access Required: To use this API endpoint, please contact us at contact@memories.ai to enable stream processing features for your account.
This endpoint starts real-time audio stream transcription. The server pulls audio from the provided stream URL, decodes it to PCM via FFmpeg, and streams it to the selected provider (ElevenLabs or AssemblyAI) over WebSocket. Every message returned by the provider is forwarded verbatim to your webhook callback. Billing occurs every 5 seconds of audio streamed.
When to use this vs the WebSocket endpoint?
  • Use this HTTP endpoint when you have a stream URL (RTMP/RTSP/HLS) and want the server to handle audio decoding and streaming.
  • Use the WebSocket endpoint when your client can send audio directly (e.g., browser microphone).
Pricing (varies by provider):
ProviderRatePer 5s billing cyclePer minute
AssemblyAI$0.15/hour$0.000208$0.0025
ElevenLabs$0.39/hour$0.000542$0.0065
Cost (USD) = duration × rate / 3600
Where:
  • duration: Audio duration in seconds
  • rate: 0.15(AssemblyAI)or0.15 (AssemblyAI) or 0.39 (ElevenLabs)
  • Charges: pre-check at start, then billed every 5 seconds of audio streamed

Supported Protocols

  • RTMP (Recommended)
  • RTSP
  • HLS (.m3u8)
  • HTTP/HTTPS (direct audio URLs)

Key Features

  • Real-time transcription via ElevenLabs or AssemblyAI (controlled by provider parameter)
  • Server-side audio decoding (FFmpeg) — no client-side processing needed
  • Verbatim callback: every upstream message forwarded as-is to your webhook
  • Real-time billing every 5 seconds of audio
  • Auto-stop on insufficient balance (status 402)
  • All provider-specific parameters transparently forwarded

Architecture

                        Your Server
                            |
                     POST /audio-stream/start
                     { audio_url, provider, ... }
                            |
                            v
                   +------------------+
                   | Memories.ai      |
                   |                  |
audio_url -------> | FFmpeg (decode)  |
                   |     |            |
                   |     v            |
                   | PCM 16kHz mono   |
                   |     |            |
                   |     v            |
                   | WebSocket -------+-------> ElevenLabs / AssemblyAI
                   |                  |
                   |  <-- messages ---|<------- Provider responses
                   |     |            |
                   |     v            |
                   | Webhook callback |-------> Your callback URL
                   +------------------+

Code Example

import requests

BASE_URL = "https://mavi-backend.memories.ai/serve/api/v2"
API_KEY = "sk-mai-this_a_test_string_please_use_your_generated_key_during_testing"
HEADERS = {
    "Authorization": f"{API_KEY}",
    "Content-Type": "application/json"
}

def start_audio_stream(audio_url: str):
    url = f"{BASE_URL}/audio-stream/start"
    data = {
        "audio_url": audio_url,
        "provider": "elevenlabs",
        "language_code": "en",
        "model_id": "scribe_v2_realtime",
        "diarize": True,
        "num_speakers": 2
    }
    resp = requests.post(url, json=data, headers=HEADERS)
    return resp.json()

result = start_audio_stream("rtmp://example.com/live/audio")
print(result)
print(f"Task ID: {result['data']['task_id']}")

Response

Returns the task information for the started audio stream.
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "660e8400-e29b-41d4-a716-446655440001",
    "message": "Audio stream transcription started"
  }
}

Request Parameters

Required

ParameterTypeDescription
audio_urlstringAudio stream URL (RTMP, RTSP, HLS, or HTTP)
providerstringTranscription provider: elevenlabs or assemblyai

Common Parameters

ParameterTypeDefaultDescription
language_codestring-Language code for transcription (e.g., en, zh, es, fr)

ElevenLabs Parameters

These parameters are forwarded when provider=elevenlabs.
ParameterTypeDefaultDescription
model_idstringscribe_v2_realtimeModel to use for transcription
tag_audio_eventsbooleanfalseTag audio events (music, laughter, etc.)
num_speakersinteger-Expected number of speakers for diarization
diarizebooleanfalseEnable speaker diarization
enable_loggingbooleanfalseEnable server-side logging
inactivity_timeoutinteger-Session timeout (seconds) when no audio is received

AssemblyAI Parameters

These parameters are forwarded when provider=assemblyai.
ParameterTypeDefaultDescription
punctuatebooleanfalseAdd punctuation to the transcript
format_textbooleanfalseFormat text in the transcript (numbers, dates)
language_detectionbooleanfalseEnable automatic language detection
language_confidence_thresholdnumber0.0Confidence threshold for language detection (0.0-1.0)
audio_start_frominteger0Start transcription from this timestamp (ms)
audio_end_atinteger0End transcription at this timestamp (ms)
multichannelbooleanfalseEnable multi-channel audio processing
speech_modelsarray-Array of speech models to use
speech_thresholdnumber0.0Speech detection threshold (0.0-1.0)
disfluenciesbooleanfalseInclude disfluencies (um, uh) in transcript
speaker_labelsbooleanfalseEnable speaker diarization
speakers_expectedinteger0Expected number of speakers
sentiment_analysisbooleanfalseEnable sentiment analysis
entity_detectionbooleanfalseEnable entity detection
auto_highlightsbooleanfalseEnable automatic highlights extraction
content_safetybooleanfalseEnable content safety detection
iab_categoriesbooleanfalseEnable IAB category classification
auto_chaptersbooleanfalseEnable automatic chapter generation
summarizationbooleanfalseEnable automatic summarization
summary_modelstring-Model for summarization (informative or conversational)
summary_typestring-Summary type (bullets or paragraph)
custom_topicsbooleanfalseEnable custom topic detection
topicsarray-Array of custom topics to detect
redact_piibooleanfalseRedact personally identifiable information
redact_pii_substring-PII redaction substitution method
redact_pii_policiesarray-Array of PII policies to apply
redact_pii_audiobooleanfalseRedact PII from audio
redact_pii_audio_qualitystring-Quality for PII audio redaction
filter_profanitybooleanfalseFilter profanity from transcript
custom_spellingarray-Array of custom spelling corrections
speech_understandingobject-Speech understanding configuration
Any parameters not listed above can still be passed in the request body. They will be captured and forwarded to the upstream provider via the URL query string. This ensures forward compatibility with new provider features.

Response Parameters

ParameterTypeDescription
codestringResponse code (200 indicates success)
messagestringResponse message
data.task_idstringUnique identifier of the transcription task
data.messagestringStatus message about the stream start

Callback Response Parameters

Callbacks are sent continuously — one for each message received from the upstream provider.
ParameterTypeDescription
codestringResponse code (200 indicates callback delivery success)
messagestringResponse message (“SUCCESS”)
task_idstringThe task ID associated with this stream
data.statusintegerStatus code (0 for normal message, see Status Codes below)
data.messagestringStatus message (null for normal messages)
data.transcriptobjectVerbatim JSON from the upstream provider (null for error/control statuses)
The data.transcript field contains the raw, unmodified response from the selected provider. The structure differs between ElevenLabs and AssemblyAI. Refer to each provider’s documentation for detailed field descriptions.

Status Codes

StatusNameDescriptionStream Continues
0MessageNormal transcription message from providerYes
-1ErrorProcessing or connection errorNo
14User StoppedUser called /audio-stream/stopNo
16Capacity ReachedServer capacity limit reachedNo
402Insufficient BalanceUser balance insufficientNo

Important Notes

  • provider is required: You must specify elevenlabs or assemblyai. Without it the request will fail.
  • Webhook required: Configure your webhook URL in user settings before using this API.
  • Verbatim callbacks: Each callback contains the exact JSON message from the provider — the server does not transform or aggregate the data.
  • Real-time billing: Billing occurs every 5 seconds of audio data streamed to the provider. Auto-stops when balance is insufficient.
  • Pre-charge: Balance is checked at start (one 5-second unit). If insufficient, the task is not started (status 402).
  • Parameter transparency: Parameters specific to a provider are forwarded as-is. Parameters not relevant to the selected provider are silently ignored by the provider.

Supported Languages

Common language codes (supported by both providers):
  • en - English
  • zh - Chinese
  • es - Spanish
  • fr - French
  • de - German
  • ja - Japanese
  • ko - Korean
  • And many more…

Rate Limiting

  • Maximum concurrent streams: Each user can run N concurrent stream tasks (video + audio combined)
  • Capacity check: Returns status 16 if server capacity is reached
  • Balance check: Returns status 402 if insufficient balance at start

Authorizations

Authorization
string
header
required

Body

application/json
audio_url
string
required

Audio stream URL (RTMP or RTSP protocol)

Example:

"rtmp://example.com/live/audio"

language_code
string

Language code for transcription

Example:

"en"

punctuate
boolean
default:false

Add punctuation to the transcript

format_text
boolean
default:false

Format text in the transcript

language_detection
boolean
default:false

Enable automatic language detection

language_confidence_threshold
number
default:0

Confidence threshold for language detection

audio_start_from
integer
default:0

Start transcription from this timestamp (milliseconds)

audio_end_at
integer
default:0

End transcription at this timestamp (milliseconds)

multichannel
boolean
default:false

Enable multi-channel audio processing

speech_models
string[]

Array of speech models to use

speech_threshold
number
default:0

Speech detection threshold (0.0-1.0)

disfluencies
boolean
default:false

Include disfluencies in the transcript

speaker_labels
boolean
default:false

Enable speaker diarization

speakers_expected
integer
default:0

Expected number of speakers

sentiment_analysis
boolean
default:false

Enable sentiment analysis

entity_detection
boolean
default:false

Enable entity detection

auto_highlights
boolean
default:false

Enable automatic highlights extraction

content_safety
boolean
default:false

Enable content safety detection

iab_categories
boolean
default:false

Enable IAB category classification

auto_chapters
boolean
default:false

Enable automatic chapter generation

summarization
boolean
default:false

Enable automatic summarization

summary_model
string

Model to use for summarization

summary_type
string

Type of summary to generate

custom_topics
boolean
default:false

Enable custom topic detection

topics
string[]

Array of custom topics to detect

redact_pii
boolean
default:false

Redact personally identifiable information

redact_pii_sub
string

PII redaction substitution method

redact_pii_policies
string[]

Array of PII policies to apply

redact_pii_audio
boolean
default:false

Redact PII from audio

redact_pii_audio_quality
string

Quality setting for PII audio redaction

filter_profanity
boolean
default:false

Filter profanity from transcript

custom_spelling
object[]

Array of custom spelling corrections

speech_understanding
object

Speech understanding configuration

Response

200 - application/json

Audio stream started successfully

code
string
Example:

200

message
string
Example:

"success"

data
object