Visual Intelligence is a collection of stateless REST APIs you call directly on your own video, image, and audio data. No persistent library, no indexing — pick the endpoint that matches your task, send a request, get a result. If you want a persistent, indexed video + image library you can query by natural language or image, you want the Visual Search product instead.Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
Core Capabilities
Video Scraping
Metadata, transcripts, captions, and comments from TikTok, YouTube, Instagram, and X/Twitter.
Video Processing
Upload, signed-URL upload, frame extraction, download, delete — manage assets used by other VI APIs.
Video & Image Understanding
Call Gemini / Nova / Qwen / GPT VLMs and ILMs directly. Pre-packaged tasks for frame description and summary.
Audio Transcription
Whisper, ElevenLabs, AssemblyAI providers + speaker diarization / recognition for files and live streams.
Live Stream Processing
Real-time content moderation and prompt-driven understanding on RTMP / RTSP streams.
Embeddings
Image / video / text embeddings for semantic search and retrieval pipelines.
Hosts
Most endpoints live onhttps://mavi-backend.memories.ai/serve/api/v2. Two specialty hosts are used for specific groups:
| Group | Host |
|---|---|
| Most Visual Intelligence endpoints | https://mavi-backend.memories.ai/serve/api/v2 |
| Live Video Understanding | https://stream.memories.ai |
| Human ReID & Caption (Security) | https://security.memories.ai |
Next Steps
API Overview
Browse every endpoint group with one-line descriptions — pick the right one for your use case.
Create Your API Key
Generate a key in under 2 minutes and start calling endpoints.
