Overview

Memories.ai REST API

_{API tools for perceiving videos} Memories.ai Video Intelligence is an all-in-one API tools platform for video scraping, video processing, and video understanding.

Base URL

All API requests are made to the following base URL:

https://mavi-backend.memories.ai/serve/api/v2

Authentication

All API requests require authentication via API key. Include your API key directly in the Authorization header (no Bearer prefix):

Authorization: sk-mai-xxxxxxxxxxxxxxxx

Do not share your API key publicly or commit it to version control. Use environment variables to manage your keys securely.

Core Capabilities

Video Scraping

Scrape metadata, comments, and details from platforms like TikTok, YouTube, Instagram, and Twitter.

Video Processing

Edit, split, clip, and extract frames from videos. Generate transcripts and metadata.

Video Understanding

Leverage state-of-the-art models like Gemini, Nova, and Qwen for deep video and image understanding.

Core Concepts

Concept	Description
asset_id	A unique identifier (e.g., `re_660727003963174912`) returned after uploading a file. Used to reference the asset in all subsequent API calls.
task_id	Returned by asynchronous endpoints. Use it to track the progress of long-running operations.
Webhook Callback	For async operations, the API sends results to your configured webhook URL when processing completes. See Webhooks Configuration.
VLM	Video Language Model — AI models that understand and analyze video content (e.g., Gemini, Nova, Qwen).
ILM	Image Language Model — AI models that understand and analyze image content.

Typical Workflow

Upload a video/image       →  POST /upload         →  Returns asset_id
Call an API endpoint        →  POST /video/edit     →  Returns task_id (async)
Receive results via webhook →  Webhook callback     →  Contains processed data
   OR poll for sync results    →  Response body        →  Contains results directly

API Categories

Scraper: Extract data from social media platforms.
Base & Processing: Essential operations for upload, download, editing, and frame extraction.
Understanding Models: Analyze content using advanced VLMs (Video Language Models) and ILMs (Image Language Models).
Transcription: Generate audio and video transcripts with speaker recognition.
Embeddings: Generate vector embeddings for videos, images, and text for similarity search and retrieval.

Choose a Video Model

Compare Gemini, Nova, and Qwen for video understanding, structured extraction, and cost tradeoffs.

Choose an Image Model

Compare Gemini, GPT, Nova, and Qwen for image reasoning, JSON output, and operational cost.

Async endpoints require a webhook to receive results. Please complete the Webhooks configuration before using any async API.

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Memories.ai REST API

Base URL

Authentication

Core Capabilities

Video Scraping

Video Processing

Video Understanding

Core Concepts

Typical Workflow

API Categories

Choose a Video Model

Choose an Image Model

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Documentation Index

​Memories.ai REST API

​Base URL

​Authentication

​Core Capabilities

Video Scraping

Video Processing

Video Understanding

​Core Concepts

​Typical Workflow

​API Categories

Choose a Video Model

Choose an Image Model

Memories.ai REST API

Base URL

Authentication

Core Capabilities

Core Concepts

Typical Workflow

API Categories