This guide is intentionally practical. It focuses on the request and output patterns you will actually use in Memories.ai, not external benchmark scoreboards.
Quick Picks
Best default
Start with Gemini Image if you want one strong default for image reasoning and schema-based extraction.
Simplest JSON path
Start with GPT Image if you want the most familiar documented
response_format path for JSON output.Lowest-cost batch
Start with Qwen Image if cost is the main constraint and you need image analysis at scale.
Tool-first extraction
Start with Nova Image if your workflow is built around tool-style extraction via
toolConfig.Choose By Workflow
| If your goal is… | Start here | Why |
|---|---|---|
| General-purpose image reasoning | Gemini Image | Strong default when you need reasoning plus schema-style structured output |
| Strict JSON extraction with a familiar API pattern | GPT Image | The page documents response_format directly and uses an OpenAI-style image request/response shape |
| Lowest-cost image analysis | Qwen Image | Cheapest published starting price among the documented image providers |
| Tool-driven extraction pipeline | Nova Image | The docs expose toolConfig for structured extraction through tool specs |
| Explicit thinking toggle plus schema output | Gemini Image or Qwen Image | Both document structured output plus explicit thinking-related fields |
Provider Differences That Matter
| Provider | Documented input shape | Structured output path | Thinking control | When it usually fits best |
|---|---|---|---|---|
| Gemini Image | input_file with image MIME type | extra_body.metadata.responseSchema | thinking_config.thinking_budget | Default image reasoning with schema-based extraction |
| GPT Image | image_url blocks in OpenAI-style messages | response_format | Not documented on this page | Image-only JSON extraction with the simplest request/response mental model |
| Nova Image | image_url plus text; tool config lives in extra_body.metadata.toolConfig | Tool-based extraction via toolConfig | Not documented on this page | Tool-oriented extraction flows and low-cost operational usage |
| Qwen Image | image plus text blocks in the same content array | extra_body.metadata.response_format.json_schema | enable_thinking + thinking_budget | Lowest-cost image analysis with explicit thinking controls |
Suggested Starting Models
| Scenario | Recommended model |
|---|---|
| Default image reasoning / extraction | gemini:gemini-3-flash-preview |
| Familiar JSON-first image extraction | gpt:gpt-5-mini |
| Cheapest first pass on large image volume | qwen:qwen3-vl-flash |
| Tool-oriented extraction workflow | nova:us.amazon.nova-lite-v1:0 |
Qwen source pricing is documented per
1K tokens on its provider pages. This guide converts the lowest published tier into 1M token terms for faster cross-provider comparison.How To Evaluate On Your Own Data
Before standardizing on an image model, compare at least these cases:- clean product or document images vs noisy real-world photos
- captioning vs field extraction
- free-form output vs strict JSON output
- small interactive workloads vs bulk batch processing
