Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
Product: Visual Agents (composing Visual Search + Visual Intelligence endpoints)
Repo:
Memories-ai-labs/examples — MIT-licensed, one Jupyter notebook per agent
Auth: Authorization: sk-mavi-... (no Bearer prefix)Memories-ai-labs/examples repo. Each notebook is self-contained — no shared library imports, every API call inlined with comments explaining the wire shape and gotchas.
Why notebooks?
A customer evaluating Memories.ai wants to see the actual request body, the actual response, and the reasoning step in context — not eight layers of abstraction. Each notebook interleaves markdown (what & why) with code (the API call) and the live response, so it reads top-to-bottom as a walkthrough you can re-run.Quickstart
00_search_api_overview.ipynb and click Run All. Real API hits come back in seconds.
The notebooks
Open them in this order if you’re new to the platform — the search-API overview is the foundation every agent notebook builds on.00. Search API Overview
Every
/search variant — semantic BY_CLIP, BY_AUDIO, exact-phrase transcripts, search by tag, by camera, time-windowed, plus composable filters. Read this first.01. SOP Compliance — QSR
Verify drive-thru staff handoffs, greetings, drink inclusion. Search candidate moments → VLM verify each with a strict JSON schema.
02. Service Quality
Service-event timeline from a floor cam → table touches, inter-course time, bounce count. No VLM needed — just multiple semantic searches.
03. Security & Threat
Six scenarios (shoplifting, scanner bypass, masked entry, slip-and-fall, restricted-area breach, altercation) with severity-tagged incident log.
04. SOP Compliance — Auto
Same SOP pattern as 01, but for an automotive service bay. Per-arrival audit: greeting within 60s? Air filter checked?
05. Video Searching Agent
Discover videos across YouTube, TikTok, Instagram, X via the managed
/queries/stream SSE endpoint. Includes an inline SSE parser.06. Video Editing (VEA)
Async highlight-reel pipeline via
/video/clip + /video/edit — final asset arrives at your configured webhook.07. Personal Memory (LUCI)
Date-windowed natural-language questions over personal recordings. Combines
datetime_taken filter + transcript lookup + VLM identification.08. Visual RAG
Two-channel retrieve (BY_CLIP + BY_AUDIO) → merge overlapping time-ranges → VLM-verify each candidate with citations.
09. Creator Intelligence
Per-video VLM scoring (production, audio, delivery, hook, brand safety) → aggregated creator scorecard with recommendation.
How each notebook is structured
Every notebook follows the same shape:| Section | Content |
|---|---|
| Title + use case | Markdown — what the agent does, when to use it |
| Setup | One code cell with import requests, host config, API key wiring |
| Helper functions | Code cells — one helper per endpoint with a docstring explaining the wire shape and gotchas (e.g. Gemini’s choices[].text envelope, code=0001 transient retry, code-fence stripping) |
| Step 1: Retrieve | Markdown → code cell with the search call → real output |
| Step 2: Reason | Markdown → code cell with the VLM call → real JSON output |
| Step 3: Aggregate | Markdown → domain-specific aggregation code |
| Where to go next | Markdown — how to customize, what to swap out for production |
The shared ReAct skeleton
Agents 1, 3, 4, 7, 8 walk the same four-step loop:| Step | Endpoint(s) | What it does |
|---|---|---|
| 1. Index | POST /upload, GET /get_metadata (poll until status=PARSE) | Get the footage into your private library and wait for indexing |
| 2. Retrieve | POST /search (semantic, BY_CLIP / BY_AUDIO), GET /search_audio_transcripts (exact phrase) | Find candidate moments by natural language |
| 3. Reason | POST /vu/chat/completions (Gemini / Qwen / Nova VLM) | Verify each candidate, extract structured facts |
| 4. Loop | Per-notebook control flow over steps 2 & 3 | Iterate until the answer is grounded |
- 05 — Video Searching wraps the managed
/queries/streamSSE endpoint. The agentic loop runs server-side; the notebook parses typed events (started,progress,tool_call,tool_result,error,complete). - 06 — Video Editing (VEA) is fire-and-forget against async
/video/edit. The final asset_id arrives at your webhook URL (configure at api-platform.memories.ai/webhooks). - 09 — Creator Intelligence doesn’t use
/searchat all — it takes a list of video URLs and runs N VLM calls in series.
Hosting videos for the VLM
The Memories.ai VLM endpoint needs a publicly fetchablefile_uri. The Visual Search /download endpoint streams the raw bytes back to you — it does not return a hosted URL. To wire up the full loop (index → search → VLM), you must bridge that gap yourself.
Each notebook uses a MEDIA_URL_MAP dict (or the MEMORIES_MEDIA_URL_TEMPLATE env var). For demo runs, the notebooks default to mapping the seed video to the public test asset (test_1min.mp4) so the VLM step works out of the box. For production, replace those entries with your own CDN URLs.
Verification
Two notebooks were executed end-to-end viajupyter nbconvert --execute against api.memories.ai:
00_search_api_overview.ipynb— BY_CLIP returned 5 real hits with scores, BY_AUDIO returned 3 transcript hits with spoken-words snippets,/search_audio_transcriptsreturned 3 LIKE matches, and the tag / camera / datetime filters all returned 0 cleanly (expected — this test account has no matching content).09_creator_intelligence.ipynb— real Gemini call returned{production_quality: 85, audio_quality: 10, delivery: 0, hook_strength: 70, brand_safety: 100}for the silent public test video.
/video/edit webhook contract.