Introduction

Visual Intelligence is a collection of stateless REST APIs you call directly on your own video, image, and audio data. No persistent library, no indexing — pick the endpoint that matches your task, send a request, get a result. If you want a persistent, indexed video + image library you can query by natural language or image, you want the Visual Search product instead.

Core Capabilities

Video Scraping

Metadata, transcripts, captions, and comments from TikTok, YouTube, Instagram, and X/Twitter.

Video Processing

Upload, signed-URL upload, frame extraction, download, delete — manage assets used by other VI APIs.

Video & Image Understanding

Call Gemini / Nova / Qwen / GPT VLMs and ILMs directly. Pre-packaged tasks for frame description and summary.

Audio Transcription

Whisper, ElevenLabs, AssemblyAI providers + speaker diarization / recognition for files and live streams.

Live Stream Processing

Real-time content moderation and prompt-driven understanding on RTMP / RTSP streams.

Embeddings

Image / video / text embeddings for semantic search and retrieval pipelines.

Hosts

Most endpoints live on https://mavi-backend.memories.ai/serve/api/v2. Two specialty hosts are used for specific groups:

Group	Host
Most Visual Intelligence endpoints	`https://mavi-backend.memories.ai/serve/api/v2`
Live Video Understanding	`https://stream.memories.ai`
Human ReID & Caption (Security)	`https://security.memories.ai`

Each endpoint page includes a banner with its exact host and the authentication header to use.

Next Steps

API Overview

Browse every endpoint group with one-line descriptions — pick the right one for your use case.

Create Your API Key

Generate a key in under 2 minutes and start calling endpoints.

API Overview

Get Started

Asset Management

Social Media Scraping

Audio File Transcription

Live Audio Transcription

Video Model APIs

Video Task APIs

Live Video Content Moderation

Live Video Understanding

Image Model APIs

Embeddings

Human ReID & Caption

Reference

Core Capabilities

Video Scraping

Video Processing

Video & Image Understanding

Audio Transcription

Live Stream Processing

Embeddings

Hosts

Next Steps

API Overview

Create Your API Key

​Core Capabilities

Video Scraping

Video Processing

Video & Image Understanding

Audio Transcription

Live Stream Processing

Embeddings

​Hosts

​Next Steps

API Overview

Create Your API Key

Core Capabilities

Hosts

Next Steps