This is a selection guide for real API usage in Memories.ai. It focuses on request shape, output controls, and cost tradeoffs instead of benchmark leaderboards.
Quick Picks
Best default
Start with Gemini Video if you want one strong default for reasoning-heavy video analysis and structured extraction.
Lowest-cost batch
Start with Qwen Video if cost is the main constraint and you need large-scale or long-running video jobs.
Tool-first extraction
Start with Nova Video if your pipeline is designed around tool-style extraction via
toolConfig.Image-only task?
If the task does not contain video, use the dedicated Image Model Selection guide instead.
Choose By Workflow
| If your goal is… | Start here | Why |
|---|---|---|
| General-purpose video understanding | Gemini Video | Best default for reasoning-heavy summaries, extraction, and mixed multimodal prompts |
| Lowest-cost video analysis | Qwen Video | Cheapest published starting price among the documented video providers |
| Tool-driven extraction pipeline | Nova Video | The docs explicitly expose toolConfig for structured extraction through tool specs |
| Schema-based structured output without tool calling | Gemini Video or Qwen Video | Both pages document schema-like structured output controls directly in the request body |
| A single provider to try first before tuning | Gemini Video | Strongest default choice when you do not yet know the quality/cost boundary of your workload |
Provider Differences That Matter
| Provider | Documented input shape | Structured output path | Thinking control | When it usually fits best |
|---|---|---|---|---|
| Gemini Video | input_file with mime_type; video does not support base64 | extra_body.metadata.responseSchema | thinking_config.thinking_budget | Reasoning-heavy video analysis and multimodal prompts with image + video together |
| Nova Video | video_url plus text; tool config lives in extra_body.metadata.toolConfig | Tool-based extraction via toolConfig | Not documented on this page | Operational workflows that already expect tool-style outputs |
| Qwen Video | video, image, and text blocks in the same content array | extra_body.metadata.response_format.json_schema | enable_thinking + thinking_budget | Lowest-cost batch analysis and explicit thinking toggles |
Suggested Starting Models
| Scenario | Recommended model |
|---|---|
| Default video summary / extraction | gemini:gemini-3-flash-preview |
| Cheapest first pass on large video volume | qwen:qwen3-vl-flash |
| Tool-oriented extraction workflow | nova:us.amazon.nova-lite-v1:0 |
Qwen source pricing is documented per
1K tokens on its provider pages. This guide converts the lowest published tier into 1M token terms for faster cross-provider comparison.How To Evaluate On Your Own Data
Before standardizing on a video model, compare at least these cases:- short clips vs long videos
- narration-heavy videos vs mostly visual videos
- free-form summaries vs strict field extraction
- low-volume interactive requests vs batch processing
Not In Scope
- GPT is not currently documented as a video understanding endpoint in Memories.ai, so it is intentionally excluded from this page.
- This page does not freeze benchmark scores, RPM limits, or uptime claims because those numbers age quickly and may differ by provider account or region.
