Search by Text

Product: Visual Search Use case: Upload videos and images, auto-index them, then search by natural language, image, or transcript phrase Host: https://api.memories.ai/serve/api/v1 Auth: Authorization: sk-mavi-... (no Bearer prefix)

Use a natural-language query to retrieve the most semantically similar clip segments from your Private Video Library. This is the canonical “find me a moment” endpoint. For related operations: Search by Image (library-wide) · Search by Image (within a video) · Search by Transcript.

Prerequisites

You have created a memories.ai API key.
At least one video has been uploaded and is in PARSE status.

Request Example

import requests

headers = {"Authorization": "sk-mavi-..."}

response = requests.post(
    "https://api.memories.ai/serve/api/v1/search",
    headers=headers,
    json={
        "search_param": "person walking on a beach at sunset",
        "search_type": "BY_CLIP",
        "unique_id": "default",
        "top_k": 10,
        "filtering_level": "medium"
    }
)
print(response.json())

Parameters

search_param

string

required

Natural-language search query. Must be non-empty.

search_type

string

default:"BY_CLIP"

What to return:

BY_CLIP — video clip segments (default). BY_VIDEO is an accepted alias.
BY_AUDIO — audio moments (semantic match against transcripts).
BY_CAPTION — caption-level vector search against the video_transcript table; the response shape is different (see BY_CAPTION response below).

Server-side enum is [BY_CLIP, BY_VIDEO, BY_AUDIO, BY_IMAGE, BY_CAPTION]. BY_IMAGE is exposed via Search by Image (within a video).

unique_id

string

default:"default"

Namespace scoping the search to a folder in your account.

top_k

integer

default:"100"

Maximum results to return.

For BY_CLIP / BY_AUDIO: range 1 – 1000.
For BY_CAPTION: range 1 – 200 (server-side default is 10 when top_k is null; otherwise the value you send is used).

filtering_level

string

Minimum similarity score:

low — score ≥ 0.15
medium — score ≥ 0.225
high — score ≥ 0.4

Omit to return all results regardless of score.

video_nos

array

Restrict search to these specific videos. Accepts up to 100 video IDs.

tag

string

Filter to content carrying this tag.

camera_tag

string

Filter by camera model (must match camera_model set at upload time).

datetime_taken

string

Filter to content captured at or after this time. Format: yyyy-MM-dd HH:mm:ss.

latitude

number

GPS latitude filter. Must be paired with longitude.

longitude

number

GPS longitude filter. Must be paired with latitude.

Response

{
    "code": "0000",
    "msg": "success",
    "data": [
        {
            "videoNo": "VI576925607808602112",
            "videoName": "1920447021987282945",
            "startTime": "13",
            "endTime": "18",
            "audio_ts": "the sun was setting over the water",
            "score": 0.5221236659362116,
            "video_bucket": "mavi-resource",
            "video_blob": "VI576925607808602112.mp4",
            "keyframe_bucket": "mavi-keyframe",
            "keyframe_blob": "<uuid>/keyframe-000013.jpg"
        }
    ],
    "success": true,
    "failed": false
}

data[].videoNo

string

Unique identifier of the matched video.

data[].videoName

string

Internal stored name of the video.

data[].startTime

string

Start time of the matched segment, in seconds.

data[].endTime

string

End time of the matched segment, in seconds.

data[].audio_ts

string

Transcript text. Populated for BY_AUDIO searches when transcription is available.

data[].score

number

Relevance score. Higher is more relevant.

data[].video_bucket

string

GCS bucket of the original video file. Omitted when the storage location cannot be resolved.

data[].video_blob

string

GCS blob (object) path of the original video. Use it with video_bucket at GET /serve/api/v2/download?bucket=&blob= to fetch the file directly.

data[].keyframe_bucket

string

GCS bucket of the matched keyframe image (BY_CLIP only).

data[].keyframe_blob

string

GCS blob (object) path of the matched keyframe image.

BY_CAPTION Response

When search_type=BY_CAPTION, the endpoint performs a vector similarity search over the video_transcript table and returns a different item shape:

{
    "code": "0000",
    "msg": "success",
    "data": [
        {
            "video_no": "VI576925607808602112",
            "text": "the sun was setting over the water",
            "vector": [0.012, -0.034, 0.005, "..."],
            "user_id": "<md5 of internal user key>",
            "start_time": 12.5,
            "end_time": 18.7,
            "score": 0.82
        }
    ],
    "success": true,
    "failed": false
}

data[].video_no

string

Unique identifier of the matched video.

data[].text

string

The caption segment text that matched.

data[].vector

array

Stored embedding vector of the matched caption row. Dimensionality is set by the embedding model and may be hundreds of floats — response size can grow accordingly.

data[].user_id

string

Internal MD5-encoded user namespace identifier (consistent across calls for the same unique_id).

data[].start_time

number

Start time of the matched caption, in seconds.

data[].end_time

number

End time of the matched caption, in seconds.

data[].score

number

Similarity score (1 - distance). Higher is more similar.

Notes & Limits

Rate limiting: Exceeding the per-account rate limit returns an error. See Rate limits.
Billing: Each successful call deducts credits from your account balance.

Authorizations

Authorization

string

header

required

Body

application/json

search_param

string

required

Natural-language search query. Must be non-empty.

Example:

"boat in the ocean"

search_type

enum<string>

default:BY_CLIP

Search modality. BY_VIDEO is treated as BY_CLIP internally. BY_CAPTION performs vector search over the video_transcript table and returns a different item shape (see response).

Available options:

BY_VIDEO,

BY_CLIP,

BY_AUDIO,

BY_IMAGE,

BY_CAPTION

Example:

"BY_CLIP"

unique_id

string

default:default

Scope/folder identifier for the authenticated account.

top_k

integer

default:100

Maximum number of results to return. Range 1-1000 for BY_CLIP/BY_AUDIO/BY_IMAGE. For BY_CAPTION the range is 1-200 (server-side default is 10 when null).

Required range: 1 <= x <= 1000

filtering_level

enum<string>

Similarity-score filter. low=0.15, medium=0.225, high=0.4.

Available options:

low,

medium,

high

Example:

"medium"

video_nos

string[]

Optional list of video numbers to restrict the search to. Max 100.

Maximum array length: 100

Example:

["VI635764894954369024"]

tag

string

Optional tag filter.

Example:

"test1"

camera_tag

string

Optional camera/device model filter. Matches the camera_model supplied at upload time.

Example:

"Canon EOS 5D"

datetime_taken

string

Optional capture-time filter in format yyyy-MM-dd HH:mm:ss.

Example:

"2025-10-20 11:00:00"

latitude

number<double>

Optional latitude filter. Must be supplied together with longitude.

Example:

88.88

longitude

number<double>

Optional longitude filter. Must be supplied together with latitude.

Example:

88.88

Response

200 - application/json

Successful response

Response shape depends on search_type. For BY_CLIP / BY_VIDEO / BY_AUDIO data is an array of video-search items (carries video_bucket/video_blob and, for BY_CLIP, keyframe_bucket/keyframe_blob); for BY_IMAGE data is a paginated image-search object (items carry bucket/blob); for BY_CAPTION data is an array of caption-search items carrying the embedding vector, text, user_id, and time range.

code

string

Example:

"0000"

msg

string

Example:

"success"

data

object

Option 1 · object[]
Option 2 · object
Option 3 · object[]

Show child attributes

success

boolean

Example:

true

failed

boolean

Example:

false

Get Started

Index Upload

Search

Library Management

Reference

Prerequisites

Request Example

Parameters

Response

BY_CAPTION Response

Notes & Limits

Authorizations

Body

Response

Get Started

Index Upload

Search

Library Management

Reference

Documentation Index

​Prerequisites

​Request Example

​Parameters

​Response

​BY_CAPTION Response

​Notes & Limits

Authorizations

Body

Response

Prerequisites

Request Example

Parameters

Response

BY_CAPTION Response

Notes & Limits