Documentation Index Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
Access Required : To use this API endpoint, please contact us at contact@memories.ai to enable stream processing features for your account.
This endpoint provides a direct WebSocket proxy to ElevenLabs or AssemblyAI real-time speech-to-text services. Your client connects to our WebSocket gateway, and we transparently forward all frames to/from the upstream provider. You send audio, you receive transcription results — in real time.
When to use this vs /audio-stream/start?
Use this WebSocket endpoint when your client can send audio directly (e.g., browser microphone, mobile app).
Use /audio-stream/start when you have a stream URL (RTMP/RTSP/HLS) and want the server to pull and process it for you.
Pricing (varies by provider): Provider Rate Per minute AssemblyAI $0.15/hour $0.0025 ElevenLabs $0.39/hour $0.0065
Billed every 5 seconds of audio streamed through the WebSocket connection.
Connection
wss://mavi-backend.memories.ai/serve/ws/v2/audio-stream
?provider=elevenlabs
&api_key=sk-mai-xxx
&model_id=scribe_v2_realtime
&language_code=en
Parameter Required Description provider Yes elevenlabs or assemblyaiapi_key Yes Your API V2 Key (can also be sent via Authorization header) other params No All additional query parameters are forwarded to the upstream provider
Authentication
Two options (pick one):
Query parameter : ?api_key=sk-mai-xxx
HTTP header : Authorization: sk-mai-xxx (sent during WebSocket handshake)
Error Codes during Handshake
HTTP Status Reason 401 Missing or invalid api_key 403 User not authorized for stream processing 400 Missing or invalid provider parameter
Provider-Specific Parameters
ElevenLabs
All parameters are passed as URL query strings and forwarded to wss://api.elevenlabs.io/v1/speech-to-text/realtime.
Parameter Type Default Description model_id string scribe_v2_realtime Model to use for transcription language_code string - ISO language code (e.g., en, zh) tag_audio_events boolean false Tag audio events like music, laughter num_speakers integer - Expected number of speakers diarize boolean false Enable speaker diarization enable_logging boolean false Enable server-side logging inactivity_timeout integer - Session timeout in seconds when no audio is received
Audio format : Send audio as JSON text frames:
{
"message_type" : "input_audio_chunk" ,
"audio_base_64" : "<base64-encoded PCM audio>" ,
"commit" : false ,
"sample_rate" : 16000
}
End session : Send a commit message:
{
"message_type" : "input_audio_chunk" ,
"audio_base_64" : "" ,
"commit" : true
}
AssemblyAI
All parameters are passed as URL query strings and forwarded to wss://streaming.assemblyai.com/v3/ws.
Parameter Type Default Description sample_rate integer 16000 Audio sample rate in Hz encoding string pcm_s16le Audio encoding format language_code string - ISO language code disable_partial_transcripts boolean false Only receive final transcripts enable_extra_session_information boolean false Include extra session info
Audio format : Send raw PCM audio as binary frames .
End session : Send a text frame:
{
"type" : "session_termination"
}
Data Flow
Client (browser/app) Memories.ai Gateway Provider (ElevenLabs/AssemblyAI)
| | |
|── WebSocket connect ─────────────────>| |
| ?provider=elevenlabs&api_key=xxx | |
| |── Validate API Key |
| |── Connect upstream ──────────────────>|
|<── Connection established ────────────|<── Connection established ────────────|
| | |
|── Audio frame ──────────────────────->|── Forward audio ────────────────────->|
|── Audio frame ──────────────────────->|── Forward audio ────────────────────->|
| | |
|<── Transcription result ──────────────|<── Transcription result ──────────────|
|<── Transcription result ──────────────|<── Transcription result ──────────────|
| | |
|── Close ─────────────────────────────>|── Close ─────────────────────────────>|
Code Examples
Python (ElevenLabs)
Python (AssemblyAI)
JavaScript (ElevenLabs)
import asyncio
import websockets
import json
import base64
WS_URL = "wss://mavi-backend.memories.ai/serve/ws/v2/audio-stream"
API_KEY = "sk-mai-your_api_key"
async def stream_audio ():
params = (
f "?provider=elevenlabs"
f "&api_key= { API_KEY } "
f "&model_id=scribe_v2_realtime"
f "&language_code=en"
)
async with websockets.connect( WS_URL + params) as ws:
# Start receiving task
async def receive ():
async for message in ws:
data = json.loads(message)
print (json.dumps(data, indent = 2 ))
recv_task = asyncio.create_task(receive())
# Send audio chunks (16kHz, 16-bit, mono PCM)
with open ( "audio.pcm" , "rb" ) as f:
while chunk := f.read( 3200 ): # 100ms chunks
msg = json.dumps({
"message_type" : "input_audio_chunk" ,
"audio_base_64" : base64.b64encode(chunk).decode(),
"commit" : False ,
"sample_rate" : 16000
})
await ws.send(msg)
await asyncio.sleep( 0.1 )
# Send commit to finalize
await ws.send(json.dumps({
"message_type" : "input_audio_chunk" ,
"audio_base_64" : "" ,
"commit" : True
}))
await recv_task
asyncio.run(stream_audio())
Response Messages
All transcription results from the upstream provider are forwarded to your client as-is (verbatim JSON). The message format depends on the selected provider.
ElevenLabs Response Example
{
"message_type" : "transcript" ,
"language_code" : "en" ,
"language_probability" : 0.98 ,
"text" : "Hello, how are you?" ,
"words" : [
{
"text" : "Hello" ,
"start" : 0.0 ,
"end" : 0.5 ,
"type" : "word"
}
]
}
AssemblyAI Response Example
{
"type" : "final_transcript" ,
"text" : "Hello, how are you?" ,
"words" : [
{
"text" : "Hello" ,
"start" : 100 ,
"end" : 500 ,
"confidence" : 0.99
}
],
"created" : "2025-01-01T00:00:00.000Z"
}
Important Notes
Per-provider billing : AssemblyAI is billed at 0.15 / h o u r , E l e v e n L a b s a t 0.15/hour, ElevenLabs at 0.15/ h o u r , El e v e n L ab s a t 0.39/hour. Charges are calculated every 5 seconds of audio streamed.
Transparent proxy : All frames are forwarded as-is in both directions. The gateway does not modify any data.
Concurrent connections : Subject to your account’s stream processing limits.
Unsupported parameters are ignored : If you pass a parameter that doesn’t apply to the selected provider, the provider will silently ignore it.
Recommended audio format : 16kHz sample rate, 16-bit signed little-endian, mono channel (PCM s16le).