Real-time Audio Stream (WebSocket)

Access Required: To use this API endpoint, please contact us at contact@memories.ai to enable stream processing features for your account.

This endpoint provides a direct WebSocket proxy to ElevenLabs or AssemblyAI real-time speech-to-text services. Your client connects to our WebSocket gateway, and we transparently forward all frames to/from the upstream provider. You send audio, you receive transcription results — in real time.

When to use this vs /audio-stream/start?

Use this WebSocket endpoint when your client can send audio directly (e.g., browser microphone, mobile app).
Use /audio-stream/start when you have a stream URL (RTMP/RTSP/HLS) and want the server to pull and process it for you.

Pricing (varies by provider):

Provider	Rate	Per minute
AssemblyAI	$0.15/hour	$0.0025
ElevenLabs	$0.39/hour	$0.0065

Billed every 5 seconds of audio streamed through the WebSocket connection.

Connection

wss://mavi-backend.memories.ai/serve/ws/v2/audio-stream
  ?provider=elevenlabs
  &api_key=sk-mai-xxx
  &model_id=scribe_v2_realtime
  &language_code=en

Parameter	Required	Description
provider	Yes	`elevenlabs` or `assemblyai`
api_key	Yes	Your API V2 Key (can also be sent via `Authorization` header)
other params	No	All additional query parameters are forwarded to the upstream provider

Authentication

Two options (pick one):

Query parameter: ?api_key=sk-mai-xxx
HTTP header: Authorization: sk-mai-xxx (sent during WebSocket handshake)

Error Codes during Handshake

HTTP Status	Reason
401	Missing or invalid `api_key`
403	User not authorized for stream processing
400	Missing or invalid `provider` parameter

Provider-Specific Parameters

ElevenLabs

All parameters are passed as URL query strings and forwarded to wss://api.elevenlabs.io/v1/speech-to-text/realtime.

Parameter	Type	Default	Description
model_id	string	scribe_v2_realtime	Model to use for transcription
language_code	string	-	ISO language code (e.g., `en`, `zh`)
tag_audio_events	boolean	false	Tag audio events like music, laughter
num_speakers	integer	-	Expected number of speakers
diarize	boolean	false	Enable speaker diarization
enable_logging	boolean	false	Enable server-side logging
inactivity_timeout	integer	-	Session timeout in seconds when no audio is received

Audio format: Send audio as JSON text frames:

{
  "message_type": "input_audio_chunk",
  "audio_base_64": "<base64-encoded PCM audio>",
  "commit": false,
  "sample_rate": 16000
}

End session: Send a commit message:

{
  "message_type": "input_audio_chunk",
  "audio_base_64": "",
  "commit": true
}

AssemblyAI

All parameters are passed as URL query strings and forwarded to wss://streaming.assemblyai.com/v3/ws.

Parameter	Type	Default	Description
sample_rate	integer	16000	Audio sample rate in Hz
encoding	string	pcm_s16le	Audio encoding format
language_code	string	-	ISO language code
disable_partial_transcripts	boolean	false	Only receive final transcripts
enable_extra_session_information	boolean	false	Include extra session info

Audio format: Send raw PCM audio as binary frames. End session: Send a text frame:

{
  "type": "session_termination"
}

Data Flow

Client (browser/app)                    Memories.ai Gateway                    Provider (ElevenLabs/AssemblyAI)
       |                                       |                                       |
       |── WebSocket connect ─────────────────>|                                       |
       |   ?provider=elevenlabs&api_key=xxx    |                                       |
       |                                       |── Validate API Key                    |
       |                                       |── Connect upstream ──────────────────>|
       |<── Connection established ────────────|<── Connection established ────────────|
       |                                       |                                       |
       |── Audio frame ──────────────────────->|── Forward audio ────────────────────->|
       |── Audio frame ──────────────────────->|── Forward audio ────────────────────->|
       |                                       |                                       |
       |<── Transcription result ──────────────|<── Transcription result ──────────────|
       |<── Transcription result ──────────────|<── Transcription result ──────────────|
       |                                       |                                       |
       |── Close ─────────────────────────────>|── Close ─────────────────────────────>|

Code Examples

import asyncio
import websockets
import json
import base64

WS_URL = "wss://mavi-backend.memories.ai/serve/ws/v2/audio-stream"
API_KEY = "sk-mai-your_api_key"

async def stream_audio():
    params = (
        f"?provider=elevenlabs"
        f"&api_key={API_KEY}"
        f"&model_id=scribe_v2_realtime"
        f"&language_code=en"
    )

    async with websockets.connect(WS_URL + params) as ws:
        # Start receiving task
        async def receive():
            async for message in ws:
                data = json.loads(message)
                print(json.dumps(data, indent=2))

        recv_task = asyncio.create_task(receive())

        # Send audio chunks (16kHz, 16-bit, mono PCM)
        with open("audio.pcm", "rb") as f:
            while chunk := f.read(3200):  # 100ms chunks
                msg = json.dumps({
                    "message_type": "input_audio_chunk",
                    "audio_base_64": base64.b64encode(chunk).decode(),
                    "commit": False,
                    "sample_rate": 16000
                })
                await ws.send(msg)
                await asyncio.sleep(0.1)

        # Send commit to finalize
        await ws.send(json.dumps({
            "message_type": "input_audio_chunk",
            "audio_base_64": "",
            "commit": True
        }))

        await recv_task

asyncio.run(stream_audio())

Response Messages

All transcription results from the upstream provider are forwarded to your client as-is (verbatim JSON). The message format depends on the selected provider.

ElevenLabs Response Example

{
  "message_type": "transcript",
  "language_code": "en",
  "language_probability": 0.98,
  "text": "Hello, how are you?",
  "words": [
    {
      "text": "Hello",
      "start": 0.0,
      "end": 0.5,
      "type": "word"
    }
  ]
}

AssemblyAI Response Example

{
  "type": "final_transcript",
  "text": "Hello, how are you?",
  "words": [
    {
      "text": "Hello",
      "start": 100,
      "end": 500,
      "confidence": 0.99
    }
  ],
  "created": "2025-01-01T00:00:00.000Z"
}

Important Notes

Per-provider billing: AssemblyAI is billed at $0.15/hour, ElevenLabs at$ 0.39/hour. Charges are calculated every 5 seconds of audio streamed.
Transparent proxy: All frames are forwarded as-is in both directions. The gateway does not modify any data.
Concurrent connections: Subject to your account’s stream processing limits.
Unsupported parameters are ignored: If you pass a parameter that doesn’t apply to the selected provider, the provider will silently ignore it.
Recommended audio format: 16kHz sample rate, 16-bit signed little-endian, mono channel (PCM s16le).

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Connection

Authentication

Error Codes during Handshake

Provider-Specific Parameters

ElevenLabs

AssemblyAI

Data Flow

Code Examples

Response Messages

ElevenLabs Response Example

AssemblyAI Response Example

Important Notes

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Documentation Index

​Connection

​Authentication

​Error Codes during Handshake

​Provider-Specific Parameters

​ElevenLabs

​AssemblyAI

​Data Flow

​Code Examples

​Response Messages

​ElevenLabs Response Example

​AssemblyAI Response Example

​Important Notes

Connection

Authentication

Error Codes during Handshake

Provider-Specific Parameters

ElevenLabs

AssemblyAI

Data Flow

Code Examples

Response Messages

ElevenLabs Response Example

AssemblyAI Response Example

Important Notes