API Overview

Every Visual Intelligence endpoint lives under one of the groups below. Skim by use case, then jump into the relevant group page for the full per-endpoint reference. For a higher-level introduction to what Visual Intelligence is and how it differs from Visual Search, see the Introduction.

Choose by Use Case

I want to…	Group	Example endpoints
Upload a file to use across other Visual Intelligence APIs	Asset Management	`POST /upload`, `POST /upload/signed-url`
Pull metadata / transcript / comments from YouTube, TikTok, Instagram, Twitter	Social Media Scraping	`POST /tiktok/video/detail`, `POST /youtube/video/transcript`
Transcribe an uploaded audio or video file	Audio File Transcription	`POST /transcriptions/sync-generate-audio`, ElevenLabs, AssemblyAI
Transcribe a live audio stream in real time	Live Audio Transcription	Server-pull `POST /audio-stream/start` or direct WebSocket
Call a Video Language Model directly with my own prompt	Video Model APIs	Gemini VLM, Nova VLM, Qwen VLM
Get a ready-made video analysis (no prompt writing)	Video Task APIs	Video Frame Description, Video Summary
Run real-time content moderation / logo detection on a live RTMP stream	Live Video Content Moderation	`POST /stream/start`, `POST /stream/stop`
Apply a custom AI prompt continuously to a live RTMP stream	Live Video Understanding	`POST /v1/understand/streamConnect`
Call an Image Language Model directly	Image Model APIs	Gemini ILM, GPT ILM, Nova ILM, Qwen ILM
Generate vector embeddings for image / video / text	Embeddings	`POST /embeddings/image`, `/video`, `/text`
Caption a video or image and identify specific people by name	Human ReID & Caption	Requires a dedicated security API key

Base URLs

Most endpoints share one host. A few specialty groups use their own hosts — every endpoint page declares its host in the banner at the top.

Used by	Host
Asset Management, Scraping, Transcription, Model/Task APIs, Embeddings, Live Audio / Live Moderation, Agents	`https://mavi-backend.memories.ai/serve/api/v2`
Live Video Understanding	`https://stream.memories.ai`
Human ReID & Caption	`https://security.memories.ai`

Authentication

All requests use API key auth via the Authorization header — no Bearer prefix:

Authorization: sk-mavi-...xxxxxxxxxxxxx

Human ReID & Caption needs a dedicated key (different from the standard sk-mavi-...) — contact support@memories.ai.

Do not share your API key publicly or commit it to version control. Use environment variables to manage your keys securely.

Sync vs Async

Pattern	Shape	When
Sync	Request → result in response body	Fast operations (sync transcription, model calls, embeddings)
Async	Request → `task_id` → result arrives at your webhook	Long operations (async transcription, video tasks, live streams)

Async endpoints require a configured webhook URL. See Webhooks.

Core Concepts

Term	Meaning
`asset_id`	Unique identifier (e.g. `re_660727003963174912`) returned by `POST /upload`. Used to reference the asset in subsequent calls.
`task_id`	Returned by async endpoints. Lets you track progress or correlate webhook callbacks to the original request.
VLM	Video Language Model — Gemini / Nova / Qwen (used by Video Model APIs and Video Task APIs)
ILM	Image Language Model — Gemini / GPT / Nova / Qwen (used by Image Model APIs)
Model API vs Task API	A Model API takes your own prompt and gives you full control. A Task API wraps a Model API in a fixed prompt and workflow for a specific use case.

Choose a Video Model

Compare Gemini, Nova, and Qwen for video understanding, structured extraction, and cost tradeoffs.

Choose an Image Model

Compare Gemini, GPT, Nova, and Qwen for image reasoning, JSON output, and operational cost.

Get Started

Asset Management

Social Media Scraping

Audio File Transcription

Live Audio Transcription

Video Model APIs

Video Task APIs

Live Video Content Moderation

Live Video Understanding

Image Model APIs

Embeddings

Human ReID & Caption

Reference

Choose by Use Case

Base URLs

Authentication

Sync vs Async

Core Concepts

Choose a Video Model

Choose an Image Model

​Choose by Use Case

​Base URLs

​Authentication

​Sync vs Async

​Core Concepts

Choose a Video Model

Choose an Image Model

Choose by Use Case

Base URLs

Authentication

Sync vs Async

Core Concepts