Skip to main content

Documentation Index

Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt

Use this file to discover all available pages before exploring further.

Visual Intelligence is a collection of stateless REST APIs you call directly on your own video, image, and audio data. No persistent library, no indexing — pick the endpoint that matches your task, send a request, get a result. If you want a persistent, indexed video + image library you can query by natural language or image, you want the Visual Search product instead.

Core Capabilities

Video Scraping

Metadata, transcripts, captions, and comments from TikTok, YouTube, Instagram, and X/Twitter.

Video Processing

Upload, signed-URL upload, frame extraction, download, delete — manage assets used by other VI APIs.

Video & Image Understanding

Call Gemini / Nova / Qwen / GPT VLMs and ILMs directly. Pre-packaged tasks for frame description and summary.

Audio Transcription

Whisper, ElevenLabs, AssemblyAI providers + speaker diarization / recognition for files and live streams.

Live Stream Processing

Real-time content moderation and prompt-driven understanding on RTMP / RTSP streams.

Embeddings

Image / video / text embeddings for semantic search and retrieval pipelines.

Hosts

Most endpoints live on https://mavi-backend.memories.ai/serve/api/v2. Two specialty hosts are used for specific groups:
GroupHost
Most Visual Intelligence endpointshttps://mavi-backend.memories.ai/serve/api/v2
Live Video Understandinghttps://stream.memories.ai
Human ReID & Caption (Security)https://security.memories.ai
Each endpoint page includes a banner with its exact host and the authentication header to use.

Next Steps

API Overview

Browse every endpoint group with one-line descriptions — pick the right one for your use case.

Create Your API Key

Generate a key in under 2 minutes and start calling endpoints.