Skip to main content

Rate Limit Overview

To ensure platform stability and fair usage, Memories.ai API enforces rate limits based on the type of API endpoint you are calling.
Rate limits are applied per account. All API keys under the same account share the same rate limit quota.

Rate Limit Tiers

Standard APIs

1 QPS (Query Per Second)Applies to most API endpoints including:
  • Video / Audio / Image Upload
  • Transcription
  • Embeddings Generation

Scraping & Task APIs

Varies by endpoint and channelApplies to scraping and long-running task endpoints including:
  • YouTube / TikTok / Instagram / Twitter Scraping
  • Async task-based processing

Understanding Models

Model Provider Rate LimitRate limits follow the underlying model provider’s own limits:
  • Video Understanding Models (VLM)
  • Image Understanding Models (ILM)

Stream Processing

Concurrent Stream LimitLimited by max concurrent streams per account (video + audio combined)

Detailed Rate Limits by Endpoint

Video Processing — 1 QPS

Transcription — 1 QPS

Embeddings — 1 QPS

Stream Processing — Concurrent Stream Limit

Access Required: Stream processing features are not enabled by default. Please contact sales to enable stream processing for your account.
Stream processing endpoints are limited by the maximum number of concurrent streams per account (video + audio combined), rather than QPS.
EndpointRate Limit
Start Video Stream ModerationMax N concurrent streams
Stop Video Stream ModerationNo Limit
Start Audio Stream TranscriptionMax N concurrent streams
Stop Audio Stream TranscriptionNo Limit
When the server capacity is reached, the API returns status code 16 (Capacity Reached). Please retry later or contact sales for a higher concurrent stream limit.

Social Media Scraping

Rate limits for scraping endpoints vary by endpoint type and the channel parameter used.

Metadata & Transcript Endpoints

These endpoints accept a channel parameter (rapid / memories.ai / apify). Rate limits are enforced per channel.
EndpointChannelRate Limit
YouTube Video Metadatarapid12 QPH
YouTube Video Metadatamemories.ai10 QPS
YouTube Video Metadataapify10 QPS
TikTok Video Metadatarapid / memories.ai600 QPM
TikTok Video Metadataapify10 QPS
Instagram Video Metadatarapid / memories.ai25 QPH
Instagram Video Metadataapify10 QPS
Twitter Video Metadatarapid / memories.ai20 QPH
Twitter Video Metadataapify10 QPS
YouTube Video Transcriptrapid / memories.ai150 QPM
YouTube Video Transcriptapify10 QPS
TikTok Video Transcriptrapid / memories.ai600 QPM
TikTok Video Transcriptapify10 QPS
Instagram Video Transcriptrapid / memories.ai150 QPM
Instagram Video Transcriptapify10 QPS
Twitter Video Transcriptrapid / memories.ai20 QPH
Twitter Video Transcriptapify10 QPS

Detail & Comment Endpoints

These endpoints do not use a channel parameter.

Video Understanding Models — Model Provider Rate Limit

Memories.ai does not impose its own QPS limit on these endpoints. The effective rate limit is determined by the underlying model provider (e.g., Google Gemini, Amazon Nova, Alibaba Qwen). If you exceed the provider’s throughput limit, the API will return an error. Usage is also subject to your account’s token quota and billing limits.
If you are choosing between providers, see Video Model Selection or Image Model Selection before optimizing for rate limits alone.
EndpointRate Limit
Gemini VideoSubject to Gemini rate limit
Nova VideoSubject to Nova rate limit
Qwen VideoSubject to Qwen rate limit

Image Understanding Models — Model Provider Rate Limit

EndpointRate Limit
Gemini ImageSubject to Gemini rate limit
GPT ImageSubject to GPT rate limit
Nova ImageSubject to Nova rate limit
Qwen ImageSubject to Qwen rate limit

What Happens When You Exceed the Limit?

If you exceed the rate limit, the API will return a 429 Too Many Requests response:
{
  "code": 429,
  "msg": "Rate limit exceeded",
  "data": null
}
Recommended retry strategy: Implement exponential backoff starting with a 1-second delay, doubling each retry, up to a maximum of 32 seconds.
import time
import requests

def request_with_retry(url, headers, data, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)
        if response.status_code != 429:
            return response
        wait_time = min(2 ** attempt, 32)
        time.sleep(wait_time)
    return response
Repeated rate limit violations may result in temporary suspension of your API key. Please ensure your application respects the rate limits.

Need Higher Rate Limits?

If your use case requires higher throughput, we offer customized rate limit plans for enterprise customers.

Contact Sales

Get in touch with our sales team to discuss a custom rate limit plan tailored to your needs.
Enterprise plans can include increased QPS/QPM limits, dedicated infrastructure, and priority support.