Every Visual Intelligence endpoint lives under one of the groups below. Skim by use case, then jump into the relevant group page for the full per-endpoint reference. For a higher-level introduction to what Visual Intelligence is and how it differs from Visual Search, see the Introduction.Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
Choose by Use Case
| I want to… | Group | Example endpoints |
|---|---|---|
| Upload a file to use across other Visual Intelligence APIs | Asset Management | POST /upload, POST /upload/signed-url |
| Pull metadata / transcript / comments from YouTube, TikTok, Instagram, Twitter | Social Media Scraping | POST /tiktok/video/detail, POST /youtube/video/transcript |
| Transcribe an uploaded audio or video file | Audio File Transcription | POST /transcriptions/sync-generate-audio, ElevenLabs, AssemblyAI |
| Transcribe a live audio stream in real time | Live Audio Transcription | Server-pull POST /audio-stream/start or direct WebSocket |
| Call a Video Language Model directly with my own prompt | Video Model APIs | Gemini VLM, Nova VLM, Qwen VLM |
| Get a ready-made video analysis (no prompt writing) | Video Task APIs | Video Frame Description, Video Summary |
| Run real-time content moderation / logo detection on a live RTMP stream | Live Video Content Moderation | POST /stream/start, POST /stream/stop |
| Apply a custom AI prompt continuously to a live RTMP stream | Live Video Understanding | POST /v1/understand/streamConnect |
| Call an Image Language Model directly | Image Model APIs | Gemini ILM, GPT ILM, Nova ILM, Qwen ILM |
| Generate vector embeddings for image / video / text | Embeddings | POST /embeddings/image, /video, /text |
| Caption a video or image and identify specific people by name | Human ReID & Caption | Requires a dedicated security API key |
Base URLs
Most endpoints share one host. A few specialty groups use their own hosts — every endpoint page declares its host in the banner at the top.| Used by | Host |
|---|---|
| Asset Management, Scraping, Transcription, Model/Task APIs, Embeddings, Live Audio / Live Moderation, Agents | https://mavi-backend.memories.ai/serve/api/v2 |
| Live Video Understanding | https://stream.memories.ai |
| Human ReID & Caption | https://security.memories.ai |
Authentication
All requests use API key auth via theAuthorization header — no Bearer prefix:
sk-mavi-...) — contact support@memories.ai.
Sync vs Async
| Pattern | Shape | When |
|---|---|---|
| Sync | Request → result in response body | Fast operations (sync transcription, model calls, embeddings) |
| Async | Request → task_id → result arrives at your webhook | Long operations (async transcription, video tasks, live streams) |
Core Concepts
| Term | Meaning |
|---|---|
asset_id | Unique identifier (e.g. re_660727003963174912) returned by POST /upload. Used to reference the asset in subsequent calls. |
task_id | Returned by async endpoints. Lets you track progress or correlate webhook callbacks to the original request. |
| VLM | Video Language Model — Gemini / Nova / Qwen (used by Video Model APIs and Video Task APIs) |
| ILM | Image Language Model — Gemini / GPT / Nova / Qwen (used by Image Model APIs) |
| Model API vs Task API | A Model API takes your own prompt and gives you full control. A Task API wraps a Model API in a fixed prompt and workflow for a specific use case. |
Choose a Video Model
Compare Gemini, Nova, and Qwen for video understanding, structured extraction, and cost tradeoffs.
Choose an Image Model
Compare Gemini, GPT, Nova, and Qwen for image reasoning, JSON output, and operational cost.
