Video Caption

Product: Visual Intelligence — Human ReID & Caption Use case: Identity-aware vision analysis — generate captions for videos and images, with optional reference photos to name specific people in the output (Human Re-identification) Host: https://security.memories.ai Auth: Dedicated API key required — contact support@memories.ai

Analyze a video and generate a natural-language caption, summary, or Q&A response. Optionally provide reference images to enable Human Re-identification (ReID) — the model will detect and name those individuals in its output. Base URL: https://security.memories.ai

Access to this API requires a dedicated API key separate from the standard Memories.ai key. Contact support@memories.ai to request access.

Endpoints

Method	Endpoint	Mode
`POST`	`/v1/understand/upload`	Async — by URL
`POST`	`/v1/understand/uploadFile`	Async — by local file
`POST`	`/v1/understand/uploadSync`	Sync — by URL (result returned immediately)
`POST`	`/v1/understand/uploadFileSync`	Sync — by local file (result returned immediately)

Async endpoints require a callback URL. Sync endpoints return the result directly in the response — suitable for shorter videos.

Request Examples

import requests

url = "https://security.memories.ai/v1/understand/upload"
headers = {"Authorization": "sk-mavi-..."}

payload = {
    "video_url": "https://example.com/video.mp4",
    "user_prompt": "Summarize the video and identify any known persons.",
    "system_prompt": "You are a video understanding system.",
    "callback": "https://your.app/callback",
    "persons": [
        {"name": "Alice", "url": "https://example.com/alice.jpg"},
        {"name": "Bob",   "url": "https://example.com/bob.jpg"}
    ],
    "thinking": False
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Parameters

video_url

string

Publicly accessible video URL. Required for URL-based endpoints (/upload, /uploadSync). Omit when uploading a local file.

user_prompt

string

required

Instruction for the analysis — e.g. "Summarize the video and identify key persons".

system_prompt

string

required

Role or context for the AI — e.g. "You are a video understanding system.".

callback

string

Public URL to receive the async result. Required for /upload and /uploadFile. Not used for sync endpoints.

persons

array

Reference people for Human Re-identification. Up to 5 entries. Each entry:

name (string, required) — identifier for the person
url (string) — publicly accessible reference image URL

thinking

boolean

default:"false"

Enable reasoning mode for more detailed analysis.

reasoning_effort

string

Only applies when thinking is true. Level 1–10; higher values use more tokens. Default -1 (model decides automatically).

boolean

default:"false"

true for conversational Q&A style; false for information retrieval / caption style.

Video Requirements

Max file size: 8 GB
Max duration: 2 hours
Reference images (ReID): up to 5 per request

Response

Async (`/upload`, `/uploadFile`)

{
  "code": 0,
  "msg": "success",
  "data": {
    "task_id": "8e03075a-2230-4e67-98d4-ba53f37c807a"
  }
}

The result is delivered to your callback URL when processing completes:

{
  "status": 0,
  "task_id": "8e03075a-2230-4e67-98d4-ba53f37c807a",
  "data": {
    "text": "Alice enters the room and greets two colleagues. Bob arrives shortly after.",
    "token": { "input": 123, "output": 456, "total": 579 }
  }
}

Sync (`/uploadSync`, `/uploadFileSync`)

{
  "code": 0,
  "msg": "success",
  "data": {
    "data": {
      "text": "The video shows a product demo in a conference room.",
      "token": { "input": 200, "output": 80, "total": 280 }
    }
  }
}

Response Fields

code

integer

0 = success, -1 = failure.

data.task_id

string

Async only. Use this to correlate callback notifications.

data.data.text

string

Sync only. Generated caption, summary, or analysis text.

data.data.token

object

Sync only. Token usage: input, output, total.

Human Re-identification (ReID)

Pass a persons array to identify specific individuals in the video. The model compares reference images against people detected in the video and names them in the output text.

"persons": [
    {"name": "Alice", "url": "https://example.com/alice.jpg"},
    {"name": "Bob",   "url": "https://example.com/bob.jpg"}
]

ReID is a feature of this endpoint — not a separate API call. Results appear naturally in the generated text (e.g., "Alice enters first, followed by Bob.").

Notes

Callback delivery is retried up to 5 times until your endpoint returns 2xx.
Sync endpoints are best suited for videos under a few minutes. For long videos, use async endpoints.

Get Started

Asset Management

Social Media Scraping

Audio File Transcription

Live Audio Transcription

Video Model APIs

Video Task APIs

Live Video Content Moderation

Live Video Understanding

Image Model APIs

Embeddings

Human ReID & Caption

Reference

Endpoints

Request Examples

Parameters

Video Requirements

Response

Async (`/upload`, `/uploadFile`)

Sync (`/uploadSync`, `/uploadFileSync`)

Response Fields

Human Re-identification (ReID)

Notes

Get Started

Asset Management

Social Media Scraping

Audio File Transcription

Live Audio Transcription

Video Model APIs

Video Task APIs

Live Video Content Moderation

Live Video Understanding

Image Model APIs

Embeddings

Human ReID & Caption

Reference

Documentation Index

​Endpoints

​Request Examples

​Parameters

​Video Requirements

​Response

​Async (/upload, /uploadFile)

​Sync (/uploadSync, /uploadFileSync)

​Response Fields

​Human Re-identification (ReID)

​Notes

Endpoints

Request Examples

Parameters

Video Requirements

Response

Async (`/upload`, `/uploadFile`)

Sync (`/uploadSync`, `/uploadFileSync`)

Response Fields

Human Re-identification (ReID)

Notes