Skip to main content

Documentation Index

Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt

Use this file to discover all available pages before exploring further.

Every Visual Intelligence endpoint lives under one of the groups below. Skim by use case, then jump into the relevant group page for the full per-endpoint reference. For a higher-level introduction to what Visual Intelligence is and how it differs from Visual Search, see the Introduction.

Choose by Use Case

I want to…GroupExample endpoints
Upload a file to use across other Visual Intelligence APIsAsset ManagementPOST /upload, POST /upload/signed-url
Pull metadata / transcript / comments from YouTube, TikTok, Instagram, TwitterSocial Media ScrapingPOST /tiktok/video/detail, POST /youtube/video/transcript
Transcribe an uploaded audio or video fileAudio File TranscriptionPOST /transcriptions/sync-generate-audio, ElevenLabs, AssemblyAI
Transcribe a live audio stream in real timeLive Audio TranscriptionServer-pull POST /audio-stream/start or direct WebSocket
Call a Video Language Model directly with my own promptVideo Model APIsGemini VLM, Nova VLM, Qwen VLM
Get a ready-made video analysis (no prompt writing)Video Task APIsVideo Frame Description, Video Summary
Run real-time content moderation / logo detection on a live RTMP streamLive Video Content ModerationPOST /stream/start, POST /stream/stop
Apply a custom AI prompt continuously to a live RTMP streamLive Video UnderstandingPOST /v1/understand/streamConnect
Call an Image Language Model directlyImage Model APIsGemini ILM, GPT ILM, Nova ILM, Qwen ILM
Generate vector embeddings for image / video / textEmbeddingsPOST /embeddings/image, /video, /text
Caption a video or image and identify specific people by nameHuman ReID & CaptionRequires a dedicated security API key

Base URLs

Most endpoints share one host. A few specialty groups use their own hosts — every endpoint page declares its host in the banner at the top.
Used byHost
Asset Management, Scraping, Transcription, Model/Task APIs, Embeddings, Live Audio / Live Moderation, Agentshttps://mavi-backend.memories.ai/serve/api/v2
Live Video Understandinghttps://stream.memories.ai
Human ReID & Captionhttps://security.memories.ai

Authentication

All requests use API key auth via the Authorization header — no Bearer prefix:
Authorization: sk-mavi-...xxxxxxxxxxxxx
Human ReID & Caption needs a dedicated key (different from the standard sk-mavi-...) — contact support@memories.ai.
Do not share your API key publicly or commit it to version control. Use environment variables to manage your keys securely.

Sync vs Async

PatternShapeWhen
SyncRequest → result in response bodyFast operations (sync transcription, model calls, embeddings)
AsyncRequest → task_id → result arrives at your webhookLong operations (async transcription, video tasks, live streams)
Async endpoints require a configured webhook URL. See Webhooks.

Core Concepts

TermMeaning
asset_idUnique identifier (e.g. re_660727003963174912) returned by POST /upload. Used to reference the asset in subsequent calls.
task_idReturned by async endpoints. Lets you track progress or correlate webhook callbacks to the original request.
VLMVideo Language Model — Gemini / Nova / Qwen (used by Video Model APIs and Video Task APIs)
ILMImage Language Model — Gemini / GPT / Nova / Qwen (used by Image Model APIs)
Model API vs Task APIA Model API takes your own prompt and gives you full control. A Task API wraps a Model API in a fixed prompt and workflow for a specific use case.

Choose a Video Model

Compare Gemini, Nova, and Qwen for video understanding, structured extraction, and cost tradeoffs.

Choose an Image Model

Compare Gemini, GPT, Nova, and Qwen for image reasoning, JSON output, and operational cost.