Skip to main content
Use this page when your request includes video. Memories.ai currently documents three video understanding providers: Gemini, Nova, and Qwen.
This is a selection guide for real API usage in Memories.ai. It focuses on request shape, output controls, and cost tradeoffs instead of benchmark leaderboards.

Quick Picks

Best default

Start with Gemini Video if you want one strong default for reasoning-heavy video analysis and structured extraction.

Lowest-cost batch

Start with Qwen Video if cost is the main constraint and you need large-scale or long-running video jobs.

Tool-first extraction

Start with Nova Video if your pipeline is designed around tool-style extraction via toolConfig.

Image-only task?

If the task does not contain video, use the dedicated Image Model Selection guide instead.

Choose By Workflow

If your goal is…Start hereWhy
General-purpose video understandingGemini VideoBest default for reasoning-heavy summaries, extraction, and mixed multimodal prompts
Lowest-cost video analysisQwen VideoCheapest published starting price among the documented video providers
Tool-driven extraction pipelineNova VideoThe docs explicitly expose toolConfig for structured extraction through tool specs
Schema-based structured output without tool callingGemini Video or Qwen VideoBoth pages document schema-like structured output controls directly in the request body
A single provider to try first before tuningGemini VideoStrongest default choice when you do not yet know the quality/cost boundary of your workload

Provider Differences That Matter

ProviderDocumented input shapeStructured output pathThinking controlWhen it usually fits best
Gemini Videoinput_file with mime_type; video does not support base64extra_body.metadata.responseSchemathinking_config.thinking_budgetReasoning-heavy video analysis and multimodal prompts with image + video together
Nova Videovideo_url plus text; tool config lives in extra_body.metadata.toolConfigTool-based extraction via toolConfigNot documented on this pageOperational workflows that already expect tool-style outputs
Qwen Videovideo, image, and text blocks in the same content arrayextra_body.metadata.response_format.json_schemaenable_thinking + thinking_budgetLowest-cost batch analysis and explicit thinking toggles

Suggested Starting Models

ScenarioRecommended model
Default video summary / extractiongemini:gemini-3-flash-preview
Cheapest first pass on large video volumeqwen:qwen3-vl-flash
Tool-oriented extraction workflownova:us.amazon.nova-lite-v1:0
Qwen source pricing is documented per 1K tokens on its provider pages. This guide converts the lowest published tier into 1M token terms for faster cross-provider comparison.

How To Evaluate On Your Own Data

Before standardizing on a video model, compare at least these cases:
  • short clips vs long videos
  • narration-heavy videos vs mostly visual videos
  • free-form summaries vs strict field extraction
  • low-volume interactive requests vs batch processing
The winner often changes depending on whether you optimize for quality, cost, or output control.

Not In Scope

  • GPT is not currently documented as a video understanding endpoint in Memories.ai, so it is intentionally excluded from this page.
  • This page does not freeze benchmark scores, RPM limits, or uptime claims because those numbers age quickly and may differ by provider account or region.

Detailed Video Model Pages