Video Model Selection

Use this page when your request includes video. Memories.ai currently documents three video understanding providers: Gemini, Nova, and Qwen.

This is a selection guide for real API usage in Memories.ai. It focuses on request shape, output controls, and cost tradeoffs instead of benchmark leaderboards.

Quick Picks

Best default

Start with Gemini Video if you want one strong default for reasoning-heavy video analysis and structured extraction.

Lowest-cost batch

Start with Qwen Video if cost is the main constraint and you need large-scale or long-running video jobs.

Tool-first extraction

Start with Nova Video if your pipeline is designed around tool-style extraction via toolConfig.

Image-only task?

If the task does not contain video, use the dedicated Image Model Selection guide instead.

Choose By Workflow

If your goal is…	Start here	Why
General-purpose video understanding	Gemini Video	Best default for reasoning-heavy summaries, extraction, and mixed multimodal prompts
Lowest-cost video analysis	Qwen Video	Cheapest published starting price among the documented video providers
Tool-driven extraction pipeline	Nova Video	The docs explicitly expose `toolConfig` for structured extraction through tool specs
Schema-based structured output without tool calling	Gemini Video or Qwen Video	Both pages document schema-like structured output controls directly in the request body
A single provider to try first before tuning	Gemini Video	Strongest default choice when you do not yet know the quality/cost boundary of your workload

Provider Differences That Matter

Provider	Documented input shape	Structured output path	Thinking control	When it usually fits best
Gemini Video	`input_file` with `mime_type`; video does not support base64	`extra_body.metadata.responseSchema`	`thinking_config.thinking_budget`	Reasoning-heavy video analysis and multimodal prompts with image + video together
Nova Video	`video_url` plus text; tool config lives in `extra_body.metadata.toolConfig`	Tool-based extraction via `toolConfig`	Not documented on this page	Operational workflows that already expect tool-style outputs
Qwen Video	`video`, `image`, and `text` blocks in the same content array	`extra_body.metadata.response_format.json_schema`	`enable_thinking` + `thinking_budget`	Lowest-cost batch analysis and explicit thinking toggles

Suggested Starting Models

Scenario	Recommended model
Default video summary / extraction	`gemini:gemini-3-flash-preview`
Cheapest first pass on large video volume	`qwen:qwen3-vl-flash`
Tool-oriented extraction workflow	`nova:us.amazon.nova-lite-v1:0`

Qwen source pricing is documented per 1K tokens on its provider pages. This guide converts the lowest published tier into 1M token terms for faster cross-provider comparison.

How To Evaluate On Your Own Data

Before standardizing on a video model, compare at least these cases:

short clips vs long videos
narration-heavy videos vs mostly visual videos
free-form summaries vs strict field extraction
low-volume interactive requests vs batch processing

The winner often changes depending on whether you optimize for quality, cost, or output control.

Not In Scope

GPT is not currently documented as a video understanding endpoint in Memories.ai, so it is intentionally excluded from this page.
This page does not freeze benchmark scores, RPM limits, or uptime claims because those numbers age quickly and may differ by provider account or region.

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Quick Picks

Best default

Lowest-cost batch

Tool-first extraction

Image-only task?

Choose By Workflow

Provider Differences That Matter

Suggested Starting Models

How To Evaluate On Your Own Data

Not In Scope

Detailed Video Model Pages

Getting Started

Video Processing

Transcription

Social Media Scraping

Video Understanding Models

Image Understanding Models

Embeddings

Stream Processing

Screenplay Extraction

Documentation Index

​Quick Picks

Best default

Lowest-cost batch

Tool-first extraction

Image-only task?

​Choose By Workflow

​Provider Differences That Matter

​Suggested Starting Models

​How To Evaluate On Your Own Data

​Not In Scope

​Detailed Video Model Pages

Quick Picks

Choose By Workflow

Provider Differences That Matter

Suggested Starting Models

How To Evaluate On Your Own Data

Not In Scope

Detailed Video Model Pages