Skip to main content
POST
/
audio-stream
/
start
Start Audio Stream Transcription
curl --request POST \
  --url https://mavi-backend.memories.ai/serve/api/v2/audio-stream/start \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "audio_url": "rtmp://example.com/live/audio",
  "language_code": "en",
  "punctuate": false,
  "format_text": false,
  "language_detection": false,
  "language_confidence_threshold": 0,
  "audio_start_from": 0,
  "audio_end_at": 0,
  "multichannel": false,
  "speech_models": [
    "<string>"
  ],
  "speech_threshold": 0,
  "disfluencies": false,
  "speaker_labels": false,
  "speakers_expected": 0,
  "sentiment_analysis": false,
  "entity_detection": false,
  "auto_highlights": false,
  "content_safety": false,
  "iab_categories": false,
  "auto_chapters": false,
  "summarization": false,
  "summary_model": "<string>",
  "summary_type": "<string>",
  "custom_topics": false,
  "topics": [
    "<string>"
  ],
  "redact_pii": false,
  "redact_pii_sub": "<string>",
  "redact_pii_policies": [
    "<string>"
  ],
  "redact_pii_audio": false,
  "redact_pii_audio_quality": "<string>",
  "filter_profanity": false,
  "custom_spelling": [
    {}
  ],
  "speech_understanding": {}
}
'
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "660e8400-e29b-41d4-a716-446655440001",
    "message": "Audio stream transcription started successfully"
  }
}
Access Required: To use this API endpoint, please contact us at contact@memories.ai to enable stream processing features for your account.
This endpoint starts real-time audio stream transcription powered by AssemblyAI. The stream is automatically segmented every 5 seconds, with both partial and final transcripts sent via webhook callbacks.
Pricing:
Cost (USD) = duration × 1.5 / 3600
  • Base rate: $1.5/hour
  • Example: $0.00208 per 5-second segment
  • Example: $0.025 per minute
Where:
  • duration: Audio duration in seconds (5 seconds per segment)
  • Charges occur at start (pre-charge for one segment) and per segment processed

Supported Protocols

  • RTMP (Recommended)
  • RTSP

Key Features

  • ✅ Real-time transcription
  • ✅ Speaker identification (diarization)
  • ✅ Sentiment analysis
  • ✅ PII redaction
  • ✅ Content safety detection
  • ✅ Automatic language detection
  • ✅ Custom vocabulary boost
  • ✅ Punctuation and formatting

Code Example

import requests

BASE_URL = "https://mavi-backend.memories.ai/serve/api/v2"
API_KEY = "sk-mai-this_a_test_string_please_use_your_generated_key_during_testing"
HEADERS = {
    "Authorization": f"{API_KEY}",
    "Content-Type": "application/json"
}

def start_audio_stream(audio_url: str, language_code: str = "en"):
    url = f"{BASE_URL}/audio-stream/start"
    data = {
        "audio_url": audio_url,
        "language_code": language_code,
        "speaker_labels": True,
        "speakers_expected": 2,
        "sentiment_analysis": True,
        "punctuate": True,
        "format_text": True
    }
    resp = requests.post(url, json=data, headers=HEADERS)
    return resp.json()

# Usage example
result = start_audio_stream("rtmp://example.com/live/audio", language_code="en")
print(result)
print(f"Task ID: {result['data']['task_id']}")

Response

Returns the task information for the started audio stream.
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "660e8400-e29b-41d4-a716-446655440001",
    "message": "Audio stream transcription started successfully"
  }
}

Request Parameters

ParameterTypeRequiredDefaultDescription
audio_urlstringYes-Audio stream URL (RTMP or RTSP protocol)
language_codestringNo-Language code for transcription (e.g., en, zh, es, fr)
punctuatebooleanNofalseAdd punctuation to the transcript
format_textbooleanNofalseFormat text in the transcript
language_detectionbooleanNofalseEnable automatic language detection
language_confidence_thresholdnumberNo0.0Confidence threshold for language detection
audio_start_fromintegerNo0Start transcription from this timestamp (milliseconds)
audio_end_atintegerNo0End transcription at this timestamp (milliseconds)
multichannelbooleanNofalseEnable multi-channel audio processing
speech_modelsarrayNo-Array of speech models to use
speech_thresholdnumberNo0.0Speech detection threshold (0.0-1.0)
disfluenciesbooleanNofalseInclude disfluencies in the transcript
speaker_labelsbooleanNofalseEnable speaker diarization
speakers_expectedintegerNo0Expected number of speakers
sentiment_analysisbooleanNofalseEnable sentiment analysis
entity_detectionbooleanNofalseEnable entity detection
auto_highlightsbooleanNofalseEnable automatic highlights extraction
content_safetybooleanNofalseEnable content safety detection
iab_categoriesbooleanNofalseEnable IAB category classification
auto_chaptersbooleanNofalseEnable automatic chapter generation
summarizationbooleanNofalseEnable automatic summarization
summary_modelstringNo-Model to use for summarization
summary_typestringNo-Type of summary to generate
custom_topicsbooleanNofalseEnable custom topic detection
topicsarrayNo-Array of custom topics to detect
redact_piibooleanNofalseRedact personally identifiable information
redact_pii_substringNo-PII redaction substitution method
redact_pii_policiesarrayNo-Array of PII policies to apply
redact_pii_audiobooleanNofalseRedact PII from audio
redact_pii_audio_qualitystringNo-Quality setting for PII audio redaction
filter_profanitybooleanNofalseFilter profanity from transcript
custom_spellingarrayNo-Array of custom spelling corrections
speech_understandingobjectNo-Speech understanding configuration

Response Parameters

ParameterTypeDescription
codestringResponse code (200 indicates success)
messagestringResponse message describing the operation result
dataobjectResponse data object containing task information
data.task_idstringUnique identifier of the transcription task
data.messagestringStatus message about the stream start

Callback Response Parameters

Callbacks are sent continuously as transcription progresses.
ParameterTypeDescription
codestringResponse code (200 indicates success)
messagestringResponse message (“SUCCESS”)
task_idstringThe task ID associated with this stream
dataobjectWrapper object containing transcription results
data.segment_indexintegerIndex of the processed segment
data.segment_start_timestringSegment start time in seconds
data.segment_end_timestringSegment end time in seconds
data.statusintegerStatus code (0 for success, see Status Codes)
data.messagestringStatus message describing the result
data.transcriptobjectAssemblyAI transcription object (null for error statuses)
data.transcript.audio_urlstringURL of the audio stream
data.transcript.idstringAssemblyAI transcription ID
data.transcript.statusstringTranscription status (“completed”)
data.transcript.language_codestringDetected or specified language code
data.transcript.confidencenumberOverall confidence score (0-1)
data.transcript.textstringFull transcribed text
data.transcript.wordsarrayArray of word-level details with timestamps
data.transcript.utterancesarrayArray of utterances (segments by speaker)
data.transcript.speaker_labelsbooleanWhether speaker identification was enabled
data.transcript.speakers_expectedintegerExpected number of speakers
data.transcript.sentiment_analysisbooleanWhether sentiment analysis was enabled
data.transcript.auto_highlightsbooleanWhether auto highlights was enabled
data.transcript.auto_highlights_resultobjectAuto highlights results
data.transcript.iab_categoriesbooleanWhether IAB categorization was enabled
data.transcript.content_safetybooleanWhether content safety detection was enabled
data.transcript.summarystringGenerated summary if enabled
data.transcript.chaptersarrayAuto-generated chapters if enabled

Status Codes

StatusNameDescriptionStream Stopped
0SuccessTranscription segment receivedNo
-1ErrorProcessing failedNo
14User StoppedUser manually stopped the streamYes
15No DataStream has no audio dataYes
16Capacity ReachedServer capacity limit reachedYes
402Insufficient BalanceUser balance insufficientYes

Callback Flow

User Request → Start Stream → Transcribe Audio (every 5s) → Webhook Callbacks
                    ↓                                              ↓
            Pre-charge for                               Transcription Result
            first segment                                   with full data

            Charge per segment

Important Notes

  • Webhook Configuration: Configure your webhook URL in user settings before using this API
  • No callback parameter needed: The system uses a unified asynchronous callback mechanism
  • Segmented processing: Audio is processed in 5-second segments with callbacks for each segment
  • Full AssemblyAI data: Each callback contains complete AssemblyAI transcription object
  • Speaker labels: Enable speaker_labels to identify different speakers (A, B, C, etc.)
  • Pre-charge mechanism: Balance is checked and pre-charged at start
  • Per-segment charging: Each processed segment is charged immediately ($0.00208 per 5-second segment)
  • Auto-stop on insufficient balance: Stream automatically stops if balance is insufficient (status 402)
  • AssemblyAI features: All AssemblyAI transcription features are supported
  • Language support: Supports multiple languages via language_code parameter

Supported Languages

Common language codes:
  • en - English
  • zh - Chinese
  • es - Spanish
  • fr - French
  • de - German
  • ja - Japanese
  • ko - Korean
  • And many more…

Rate Limiting

  • Maximum concurrent streams: Each user can run N concurrent stream tasks (video + audio combined)
  • Capacity check: Returns status 16 if server capacity is reached
  • Balance check: Returns HTTP 402 if insufficient balance at start

Authorizations

Authorization
string
header
required

Body

application/json
audio_url
string
required

Audio stream URL (RTMP or RTSP protocol)

Example:

"rtmp://example.com/live/audio"

language_code
string

Language code for transcription

Example:

"en"

punctuate
boolean
default:false

Add punctuation to the transcript

format_text
boolean
default:false

Format text in the transcript

language_detection
boolean
default:false

Enable automatic language detection

language_confidence_threshold
number
default:0

Confidence threshold for language detection

audio_start_from
integer
default:0

Start transcription from this timestamp (milliseconds)

audio_end_at
integer
default:0

End transcription at this timestamp (milliseconds)

multichannel
boolean
default:false

Enable multi-channel audio processing

speech_models
string[]

Array of speech models to use

speech_threshold
number
default:0

Speech detection threshold (0.0-1.0)

disfluencies
boolean
default:false

Include disfluencies in the transcript

speaker_labels
boolean
default:false

Enable speaker diarization

speakers_expected
integer
default:0

Expected number of speakers

sentiment_analysis
boolean
default:false

Enable sentiment analysis

entity_detection
boolean
default:false

Enable entity detection

auto_highlights
boolean
default:false

Enable automatic highlights extraction

content_safety
boolean
default:false

Enable content safety detection

iab_categories
boolean
default:false

Enable IAB category classification

auto_chapters
boolean
default:false

Enable automatic chapter generation

summarization
boolean
default:false

Enable automatic summarization

summary_model
string

Model to use for summarization

summary_type
string

Type of summary to generate

custom_topics
boolean
default:false

Enable custom topic detection

topics
string[]

Array of custom topics to detect

redact_pii
boolean
default:false

Redact personally identifiable information

redact_pii_sub
string

PII redaction substitution method

redact_pii_policies
string[]

Array of PII policies to apply

redact_pii_audio
boolean
default:false

Redact PII from audio

redact_pii_audio_quality
string

Quality setting for PII audio redaction

filter_profanity
boolean
default:false

Filter profanity from transcript

custom_spelling
object[]

Array of custom spelling corrections

speech_understanding
object

Speech understanding configuration

Response

200 - application/json

Audio stream started successfully

code
string
Example:

200

message
string
Example:

"success"

data
object