Skip to main content
Use this page when your request is image-only. Memories.ai currently documents Gemini, GPT, Nova, and Qwen image understanding endpoints.
This guide is intentionally practical. It focuses on the request and output patterns you will actually use in Memories.ai, not external benchmark scoreboards.

Quick Picks

Best default

Start with Gemini Image if you want one strong default for image reasoning and schema-based extraction.

Simplest JSON path

Start with GPT Image if you want the most familiar documented response_format path for JSON output.

Lowest-cost batch

Start with Qwen Image if cost is the main constraint and you need image analysis at scale.

Tool-first extraction

Start with Nova Image if your workflow is built around tool-style extraction via toolConfig.

Choose By Workflow

If your goal is…Start hereWhy
General-purpose image reasoningGemini ImageStrong default when you need reasoning plus schema-style structured output
Strict JSON extraction with a familiar API patternGPT ImageThe page documents response_format directly and uses an OpenAI-style image request/response shape
Lowest-cost image analysisQwen ImageCheapest published starting price among the documented image providers
Tool-driven extraction pipelineNova ImageThe docs expose toolConfig for structured extraction through tool specs
Explicit thinking toggle plus schema outputGemini Image or Qwen ImageBoth document structured output plus explicit thinking-related fields

Provider Differences That Matter

ProviderDocumented input shapeStructured output pathThinking controlWhen it usually fits best
Gemini Imageinput_file with image MIME typeextra_body.metadata.responseSchemathinking_config.thinking_budgetDefault image reasoning with schema-based extraction
GPT Imageimage_url blocks in OpenAI-style messagesresponse_formatNot documented on this pageImage-only JSON extraction with the simplest request/response mental model
Nova Imageimage_url plus text; tool config lives in extra_body.metadata.toolConfigTool-based extraction via toolConfigNot documented on this pageTool-oriented extraction flows and low-cost operational usage
Qwen Imageimage plus text blocks in the same content arrayextra_body.metadata.response_format.json_schemaenable_thinking + thinking_budgetLowest-cost image analysis with explicit thinking controls

Suggested Starting Models

ScenarioRecommended model
Default image reasoning / extractiongemini:gemini-3-flash-preview
Familiar JSON-first image extractiongpt:gpt-5-mini
Cheapest first pass on large image volumeqwen:qwen3-vl-flash
Tool-oriented extraction workflownova:us.amazon.nova-lite-v1:0
Qwen source pricing is documented per 1K tokens on its provider pages. This guide converts the lowest published tier into 1M token terms for faster cross-provider comparison.

How To Evaluate On Your Own Data

Before standardizing on an image model, compare at least these cases:
  • clean product or document images vs noisy real-world photos
  • captioning vs field extraction
  • free-form output vs strict JSON output
  • small interactive workloads vs bulk batch processing
The right choice often depends more on your output contract than on headline model branding.