Start real-time audio stream transcription with speaker identification and advanced features.
duration: Audio duration in seconds (5 seconds per segment)| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| audio_url | string | Yes | - | Audio stream URL (RTMP or RTSP protocol) |
| language_code | string | No | - | Language code for transcription (e.g., en, zh, es, fr) |
| punctuate | boolean | No | false | Add punctuation to the transcript |
| format_text | boolean | No | false | Format text in the transcript |
| language_detection | boolean | No | false | Enable automatic language detection |
| language_confidence_threshold | number | No | 0.0 | Confidence threshold for language detection |
| audio_start_from | integer | No | 0 | Start transcription from this timestamp (milliseconds) |
| audio_end_at | integer | No | 0 | End transcription at this timestamp (milliseconds) |
| multichannel | boolean | No | false | Enable multi-channel audio processing |
| speech_models | array | No | - | Array of speech models to use |
| speech_threshold | number | No | 0.0 | Speech detection threshold (0.0-1.0) |
| disfluencies | boolean | No | false | Include disfluencies in the transcript |
| speaker_labels | boolean | No | false | Enable speaker diarization |
| speakers_expected | integer | No | 0 | Expected number of speakers |
| sentiment_analysis | boolean | No | false | Enable sentiment analysis |
| entity_detection | boolean | No | false | Enable entity detection |
| auto_highlights | boolean | No | false | Enable automatic highlights extraction |
| content_safety | boolean | No | false | Enable content safety detection |
| iab_categories | boolean | No | false | Enable IAB category classification |
| auto_chapters | boolean | No | false | Enable automatic chapter generation |
| summarization | boolean | No | false | Enable automatic summarization |
| summary_model | string | No | - | Model to use for summarization |
| summary_type | string | No | - | Type of summary to generate |
| custom_topics | boolean | No | false | Enable custom topic detection |
| topics | array | No | - | Array of custom topics to detect |
| redact_pii | boolean | No | false | Redact personally identifiable information |
| redact_pii_sub | string | No | - | PII redaction substitution method |
| redact_pii_policies | array | No | - | Array of PII policies to apply |
| redact_pii_audio | boolean | No | false | Redact PII from audio |
| redact_pii_audio_quality | string | No | - | Quality setting for PII audio redaction |
| filter_profanity | boolean | No | false | Filter profanity from transcript |
| custom_spelling | array | No | - | Array of custom spelling corrections |
| speech_understanding | object | No | - | Speech understanding configuration |
| Parameter | Type | Description |
|---|---|---|
| code | string | Response code (200 indicates success) |
| message | string | Response message describing the operation result |
| data | object | Response data object containing task information |
| data.task_id | string | Unique identifier of the transcription task |
| data.message | string | Status message about the stream start |
| Parameter | Type | Description |
|---|---|---|
| code | string | Response code (200 indicates success) |
| message | string | Response message (“SUCCESS”) |
| task_id | string | The task ID associated with this stream |
| data | object | Wrapper object containing transcription results |
| data.segment_index | integer | Index of the processed segment |
| data.segment_start_time | string | Segment start time in seconds |
| data.segment_end_time | string | Segment end time in seconds |
| data.status | integer | Status code (0 for success, see Status Codes) |
| data.message | string | Status message describing the result |
| data.transcript | object | AssemblyAI transcription object (null for error statuses) |
| data.transcript.audio_url | string | URL of the audio stream |
| data.transcript.id | string | AssemblyAI transcription ID |
| data.transcript.status | string | Transcription status (“completed”) |
| data.transcript.language_code | string | Detected or specified language code |
| data.transcript.confidence | number | Overall confidence score (0-1) |
| data.transcript.text | string | Full transcribed text |
| data.transcript.words | array | Array of word-level details with timestamps |
| data.transcript.utterances | array | Array of utterances (segments by speaker) |
| data.transcript.speaker_labels | boolean | Whether speaker identification was enabled |
| data.transcript.speakers_expected | integer | Expected number of speakers |
| data.transcript.sentiment_analysis | boolean | Whether sentiment analysis was enabled |
| data.transcript.auto_highlights | boolean | Whether auto highlights was enabled |
| data.transcript.auto_highlights_result | object | Auto highlights results |
| data.transcript.iab_categories | boolean | Whether IAB categorization was enabled |
| data.transcript.content_safety | boolean | Whether content safety detection was enabled |
| data.transcript.summary | string | Generated summary if enabled |
| data.transcript.chapters | array | Auto-generated chapters if enabled |
| Status | Name | Description | Stream Stopped |
|---|---|---|---|
| 0 | Success | Transcription segment received | No |
| -1 | Error | Processing failed | No |
| 14 | User Stopped | User manually stopped the stream | Yes |
| 15 | No Data | Stream has no audio data | Yes |
| 16 | Capacity Reached | Server capacity limit reached | Yes |
| 402 | Insufficient Balance | User balance insufficient | Yes |
speaker_labels to identify different speakers (A, B, C, etc.)language_code parameteren - Englishzh - Chinesees - Spanishfr - Frenchde - Germanja - Japaneseko - KoreanAudio stream URL (RTMP or RTSP protocol)
"rtmp://example.com/live/audio"
Language code for transcription
"en"
Add punctuation to the transcript
Format text in the transcript
Enable automatic language detection
Confidence threshold for language detection
Start transcription from this timestamp (milliseconds)
End transcription at this timestamp (milliseconds)
Enable multi-channel audio processing
Array of speech models to use
Speech detection threshold (0.0-1.0)
Include disfluencies in the transcript
Enable speaker diarization
Expected number of speakers
Enable sentiment analysis
Enable entity detection
Enable automatic highlights extraction
Enable content safety detection
Enable IAB category classification
Enable automatic chapter generation
Enable automatic summarization
Model to use for summarization
Type of summary to generate
Enable custom topic detection
Array of custom topics to detect
Redact personally identifiable information
PII redaction substitution method
Array of PII policies to apply
Redact PII from audio
Quality setting for PII audio redaction
Filter profanity from transcript
Array of custom spelling corrections
Speech understanding configuration