Skip to main content

Memories.ai REST API

API tools for perceiving videos Memories.ai Video Intelligence is an all-in-one API tools platform for video scraping, video processing, and video understanding.

Base URL

All API requests are made to the following base URL:
https://mavi-backend.memories.ai/serve/api/v2

Authentication

All API requests require authentication via API key. Include your API key directly in the Authorization header (no Bearer prefix):
Authorization: sk-mai-xxxxxxxxxxxxxxxx
Do not share your API key publicly or commit it to version control. Use environment variables to manage your keys securely.

Core Capabilities

Video Scraping

Scrape metadata, comments, and details from platforms like TikTok, YouTube, Instagram, and Twitter.

Video Processing

Edit, split, clip, and extract frames from videos. Generate transcripts and metadata.

Video Understanding

Leverage state-of-the-art models like Gemini, Nova, and Qwen for deep video and image understanding.

Core Concepts

ConceptDescription
asset_idA unique identifier (e.g., re_660727003963174912) returned after uploading a file. Used to reference the asset in all subsequent API calls.
task_idReturned by asynchronous endpoints. Use it to track the progress of long-running operations.
Webhook CallbackFor async operations, the API sends results to your configured webhook URL when processing completes. See Webhooks Configuration.
VLMVideo Language Model — AI models that understand and analyze video content (e.g., Gemini, Nova, Qwen).
ILMImage Language Model — AI models that understand and analyze image content.

Typical Workflow

1. Upload a video/image       →  POST /upload         →  Returns asset_id
2. Call an API endpoint        →  POST /video/edit     →  Returns task_id (async)
3. Receive results via webhook →  Webhook callback     →  Contains processed data
   OR poll for sync results    →  Response body        →  Contains results directly

API Categories

  • Scraper: Extract data from social media platforms.
  • Base & Processing: Essential operations for upload, download, editing, and frame extraction.
  • Understanding Models: Analyze content using advanced VLMs (Video Language Models) and ILMs (Image Language Models).
  • Transcription: Generate audio and video transcripts with speaker recognition.
  • Embeddings: Generate vector embeddings for videos, images, and text for similarity search and retrieval.

Choose a Video Model

Compare Gemini, Nova, and Qwen for video understanding, structured extraction, and cost tradeoffs.

Choose an Image Model

Compare Gemini, GPT, Nova, and Qwen for image reasoning, JSON output, and operational cost.
Async endpoints require a webhook to receive results. Please complete the Webhooks configuration before using any async API.