Documentation Index
Fetch the complete documentation index at: https://api-tools.memories.ai/llms.txt
Use this file to discover all available pages before exploring further.
Product: Visual Search
Use case: Upload videos and images, auto-index them, then search by natural language, image, or transcript phrase
Host: https://api.memories.ai/serve/api/v1
Auth: Authorization: sk-mavi-... (no Bearer prefix)
Use a natural-language query to retrieve the most semantically similar clip segments from your Private Video Library. This is the canonical “find me a moment” endpoint.
For related operations: Search by Image (library-wide) · Search by Image (within a video) · Search by Transcript.
Prerequisites
Request Example
import requests
headers = {"Authorization": "sk-mavi-..."}
response = requests.post(
"https://api.memories.ai/serve/api/v1/search",
headers=headers,
json={
"search_param": "person walking on a beach at sunset",
"search_type": "BY_CLIP",
"unique_id": "default",
"top_k": 10,
"filtering_level": "medium"
}
)
print(response.json())
Parameters
Natural-language search query. Must be non-empty.
What to return:
BY_CLIP — video clip segments (default). BY_VIDEO is an accepted alias.
BY_AUDIO — audio moments (semantic match against transcripts).
BY_CAPTION — caption-level vector search against the video_transcript table; the response shape is different (see BY_CAPTION response below).
Server-side enum is [BY_CLIP, BY_VIDEO, BY_AUDIO, BY_IMAGE, BY_CAPTION]. BY_IMAGE is exposed via Search by Image (within a video).
Namespace scoping the search to a folder in your account.
Maximum results to return.
- For
BY_CLIP / BY_AUDIO: range 1 – 1000.
- For
BY_CAPTION: range 1 – 200 (server-side default is 10 when top_k is null; otherwise the value you send is used).
Minimum similarity score:
low — score ≥ 0.15
medium — score ≥ 0.225
high — score ≥ 0.4
Omit to return all results regardless of score.
Restrict search to these specific videos. Accepts up to 100 video IDs.
Filter to content carrying this tag.
Filter by camera model (must match camera_model set at upload time).
Filter to content captured at or after this time. Format: yyyy-MM-dd HH:mm:ss.
GPS latitude filter. Must be paired with longitude.
GPS longitude filter. Must be paired with latitude.
Response
{
"code": "0000",
"msg": "success",
"data": [
{
"videoNo": "VI576925607808602112",
"videoName": "1920447021987282945",
"startTime": "13",
"endTime": "18",
"audio_ts": "the sun was setting over the water",
"score": 0.5221236659362116,
"video_bucket": "mavi-resource",
"video_blob": "VI576925607808602112.mp4",
"keyframe_bucket": "mavi-keyframe",
"keyframe_blob": "<uuid>/keyframe-000013.jpg"
}
],
"success": true,
"failed": false
}
Unique identifier of the matched video.
Internal stored name of the video.
Start time of the matched segment, in seconds.
End time of the matched segment, in seconds.
Transcript text. Populated for BY_AUDIO searches when transcription is available.
Relevance score. Higher is more relevant.
GCS bucket of the original video file. Omitted when the storage location cannot be resolved.
GCS blob (object) path of the original video. Use it with video_bucket at GET /serve/api/v2/download?bucket=&blob= to fetch the file directly.
GCS bucket of the matched keyframe image (BY_CLIP only).
GCS blob (object) path of the matched keyframe image.
BY_CAPTION Response
When search_type=BY_CAPTION, the endpoint performs a vector similarity search over the video_transcript table and returns a different item shape:
{
"code": "0000",
"msg": "success",
"data": [
{
"video_no": "VI576925607808602112",
"text": "the sun was setting over the water",
"vector": [0.012, -0.034, 0.005, "..."],
"user_id": "<md5 of internal user key>",
"start_time": 12.5,
"end_time": 18.7,
"score": 0.82
}
],
"success": true,
"failed": false
}
Unique identifier of the matched video.
The caption segment text that matched.
Stored embedding vector of the matched caption row. Dimensionality is set by the embedding model and may be hundreds of floats — response size can grow accordingly.
Internal MD5-encoded user namespace identifier (consistent across calls for the same unique_id).
Start time of the matched caption, in seconds.
End time of the matched caption, in seconds.
Similarity score (1 - distance). Higher is more similar.
Notes & Limits
- Rate limiting: Exceeding the per-account rate limit returns an error. See Rate limits.
- Billing: Each successful call deducts credits from your account balance.
Natural-language search query. Must be non-empty.
search_type
enum<string>
default:BY_CLIP
Search modality. BY_VIDEO is treated as BY_CLIP internally. BY_CAPTION performs vector search over the video_transcript table and returns a different item shape (see response).
Available options:
BY_VIDEO,
BY_CLIP,
BY_AUDIO,
BY_IMAGE,
BY_CAPTION
Scope/folder identifier for the authenticated account.
Maximum number of results to return. Range 1-1000 for BY_CLIP/BY_AUDIO/BY_IMAGE. For BY_CAPTION the range is 1-200 (server-side default is 10 when null).
Required range: 1 <= x <= 1000
Similarity-score filter. low=0.15, medium=0.225, high=0.4.
Available options:
low,
medium,
high
Optional list of video numbers to restrict the search to. Max 100.
Maximum array length: 100
Optional camera/device model filter. Matches the camera_model supplied at upload time.
Optional capture-time filter in format yyyy-MM-dd HH:mm:ss.
Optional latitude filter. Must be supplied together with longitude.
Optional longitude filter. Must be supplied together with latitude.
Response shape depends on search_type. For BY_CLIP / BY_VIDEO / BY_AUDIO data is an array of video-search items (carries video_bucket/video_blob and, for BY_CLIP, keyframe_bucket/keyframe_blob); for BY_IMAGE data is a paginated image-search object (items carry bucket/blob); for BY_CAPTION data is an array of caption-search items carrying the embedding vector, text, user_id, and time range.
Option 1 · object[]
Option 2 · object
Option 3 · object[]