Start real-time audio stream transcription with ElevenLabs or AssemblyAI, controlled by a provider parameter.
This endpoint starts real-time audio stream transcription. The server pulls audio from the provided stream URL, decodes it to PCM via FFmpeg, and streams it to the selected provider (ElevenLabs or AssemblyAI) over WebSocket. Every message returned by the provider is forwarded verbatim to your webhook callback. Billing occurs every 5 seconds of audio streamed.Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Provider | Rate | Per 5s billing cycle | Per minute |
|---|---|---|---|
| AssemblyAI | $0.15/hour | $0.000208 | $0.0025 |
| ElevenLabs | $0.39/hour | $0.000542 | $0.0065 |
duration: Audio duration in secondsrate: 0.39 (ElevenLabs)provider parameter)| Parameter | Type | Description |
|---|---|---|
| audio_url | string | Audio stream URL (RTMP, RTSP, HLS, or HTTP) |
| provider | string | Transcription provider: elevenlabs or assemblyai |
| Parameter | Type | Default | Description |
|---|---|---|---|
| language_code | string | - | Language code for transcription (e.g., en, zh, es, fr) |
provider=elevenlabs.
| Parameter | Type | Default | Description |
|---|---|---|---|
| model_id | string | scribe_v2_realtime | Model to use for transcription |
| tag_audio_events | boolean | false | Tag audio events (music, laughter, etc.) |
| num_speakers | integer | - | Expected number of speakers for diarization |
| diarize | boolean | false | Enable speaker diarization |
| enable_logging | boolean | false | Enable server-side logging |
| inactivity_timeout | integer | - | Session timeout (seconds) when no audio is received |
provider=assemblyai.
| Parameter | Type | Default | Description |
|---|---|---|---|
| punctuate | boolean | false | Add punctuation to the transcript |
| format_text | boolean | false | Format text in the transcript (numbers, dates) |
| language_detection | boolean | false | Enable automatic language detection |
| language_confidence_threshold | number | 0.0 | Confidence threshold for language detection (0.0-1.0) |
| audio_start_from | integer | 0 | Start transcription from this timestamp (ms) |
| audio_end_at | integer | 0 | End transcription at this timestamp (ms) |
| multichannel | boolean | false | Enable multi-channel audio processing |
| speech_models | array | - | Array of speech models to use |
| speech_threshold | number | 0.0 | Speech detection threshold (0.0-1.0) |
| disfluencies | boolean | false | Include disfluencies (um, uh) in transcript |
| speaker_labels | boolean | false | Enable speaker diarization |
| speakers_expected | integer | 0 | Expected number of speakers |
| sentiment_analysis | boolean | false | Enable sentiment analysis |
| entity_detection | boolean | false | Enable entity detection |
| auto_highlights | boolean | false | Enable automatic highlights extraction |
| content_safety | boolean | false | Enable content safety detection |
| iab_categories | boolean | false | Enable IAB category classification |
| auto_chapters | boolean | false | Enable automatic chapter generation |
| summarization | boolean | false | Enable automatic summarization |
| summary_model | string | - | Model for summarization (informative or conversational) |
| summary_type | string | - | Summary type (bullets or paragraph) |
| custom_topics | boolean | false | Enable custom topic detection |
| topics | array | - | Array of custom topics to detect |
| redact_pii | boolean | false | Redact personally identifiable information |
| redact_pii_sub | string | - | PII redaction substitution method |
| redact_pii_policies | array | - | Array of PII policies to apply |
| redact_pii_audio | boolean | false | Redact PII from audio |
| redact_pii_audio_quality | string | - | Quality for PII audio redaction |
| filter_profanity | boolean | false | Filter profanity from transcript |
| custom_spelling | array | - | Array of custom spelling corrections |
| speech_understanding | object | - | Speech understanding configuration |
| Parameter | Type | Description |
|---|---|---|
| code | string | Response code (200 indicates success) |
| message | string | Response message |
| data.task_id | string | Unique identifier of the transcription task |
| data.message | string | Status message about the stream start |
| Parameter | Type | Description |
|---|---|---|
| code | string | Response code (200 indicates callback delivery success) |
| message | string | Response message (“SUCCESS”) |
| task_id | string | The task ID associated with this stream |
| data.status | integer | Status code (0 for normal message, see Status Codes below) |
| data.message | string | Status message (null for normal messages) |
| data.transcript | object | Verbatim JSON from the upstream provider (null for error/control statuses) |
| Status | Name | Description | Stream Continues |
|---|---|---|---|
| 0 | Message | Normal transcription message from provider | Yes |
| -1 | Error | Processing or connection error | No |
| 14 | User Stopped | User called /audio-stream/stop | No |
| 16 | Capacity Reached | Server capacity limit reached | No |
| 402 | Insufficient Balance | User balance insufficient | No |
provider is required: You must specify elevenlabs or assemblyai. Without it the request will fail.en - Englishzh - Chinesees - Spanishfr - Frenchde - Germanja - Japaneseko - KoreanAudio stream URL (RTMP or RTSP protocol)
"rtmp://example.com/live/audio"
Language code for transcription
"en"
Add punctuation to the transcript
Format text in the transcript
Enable automatic language detection
Confidence threshold for language detection
Start transcription from this timestamp (milliseconds)
End transcription at this timestamp (milliseconds)
Enable multi-channel audio processing
Array of speech models to use
Speech detection threshold (0.0-1.0)
Include disfluencies in the transcript
Enable speaker diarization
Expected number of speakers
Enable sentiment analysis
Enable entity detection
Enable automatic highlights extraction
Enable content safety detection
Enable IAB category classification
Enable automatic chapter generation
Enable automatic summarization
Model to use for summarization
Type of summary to generate
Enable custom topic detection
Array of custom topics to detect
Redact personally identifiable information
PII redaction substitution method
Array of PII policies to apply
Redact PII from audio
Quality setting for PII audio redaction
Filter profanity from transcript
Array of custom spelling corrections
Speech understanding configuration