VibeDex

Our Methodology

How VibeDex benchmarks and scores AI image generators — transparently and independently.

Last updated: February 2026

Benchmark Scope

200+

Test Prompts

Photorealism, illustration, typography, product shots, concept art, and edge cases

20

Models Benchmarked

All major providers — same prompts, same conditions, no cherry-picking

3,500+

Evaluations

Every model-prompt pair scored across multiple quality dimensions

How We Evaluate

Every generated image is evaluated by Gemini 3 Pro as our primary visual language model (VLM) judge. We tested multiple VLMs — including Gemini 2.5 Pro, Claude Opus, and Claude Sonnet — before selecting Gemini 3 Pro for its consistency, scoring calibration, and ability to assess fine-grained visual quality across diverse styles.

Our prompt suite is designed to isolate specific quality dimensions. Some prompts target photorealistic accuracy, others stress-test text rendering, physical plausibility, or complex multi-subject compositions. Every model runs the same prompts under the same conditions — we generate a single image per model-prompt pair with no cherry-picking or re-rolling.

What We Measure

Unlike single-score rankings, VibeDex evaluates models across four quality dimensions, each with granular sub-metrics. This means we can match you with the right model for your specific task.

Visual Fidelity

Overall image quality and visual appeal:

  • Aesthetics — artistic quality, color harmony, visual impact
  • Image Quality — sharpness, noise, artifact-free rendering
  • Composition — framing, balance, visual hierarchy

Physics & Logic

Realistic lighting, materials, gravity, and physical plausibility:

  • Static Physics — gravity, support, spatial relationships
  • Material Physics — textures, reflections, transparency
  • Biomechanics — natural poses, joint articulation, movement

Subject & Object Integrity

Accurate anatomy, object coherence, and scene consistency:

  • Human Subjects — anatomy, faces, hands, proportions
  • Object Integrity — structural coherence, correct details
  • Scene Logic — spatial relationships, context consistency

Instruction Adherence

How faithfully the output matches the prompt:

  • Semantic Accuracy — correct subjects, actions, attributes
  • Spatial Framing — camera angle, layout, positioning
  • Text Rendering — accuracy and legibility of in-image text

Scoring Approach

Not all dimensions matter equally for every prompt. A product photography prompt demands high visual fidelity and physics accuracy, while a fantasy illustration prioritizes composition and subject integrity.

Our scoring engine analyzes each prompt to determine which quality dimensions are most important. The primary dimension is scored in depth across its sub-metrics, while the remaining dimensions receive holistic scores. The final score is a weighted combination across all four dimensions, tuned to what your specific prompt demands.

Limitations

No benchmark is perfect. We believe in being transparent about ours:

  • Automated scoring only — our evaluations are AI-judged. Human review validates trends but does not produce individual scores.
  • English-focused prompt set — all evaluation prompts are currently in English. Multi-language support is planned.
  • Single generation per pair — we generate one image per model-prompt combination. No cherry-picking, but also no variance sampling.
  • Models update frequently — providers ship updates regularly. Our scores reflect performance at evaluation date and are re-run periodically.
  • Artistic subjectivity — style preference is inherently personal. Our scores measure technical quality, not taste.

Models Benchmarked

We currently benchmark 20 image generation models across all major providers. Models are re-evaluated as new versions are released.

ModelTierCost/Image
Flux SchnellBudget$0.0010
Flux DevBudget$0.0030
Qwen Image 2512Budget$0.0030
Seedream 3.0Standard$0.0180
Reve ImageStandard$0.0240
Seedream 4.0Standard$0.0300
Ideogram 2aStandard$0.0320
FLUX.2 ProStandard$0.0350
Nano BananaStandard$0.0390
FLUX 1.1 ProStandard$0.0400
Ideogram 3.0Standard$0.0400
Seedream 4.5Standard$0.0400
Kling Image O1Standard$0.0400
FLUX.2 MaxPremium$0.0700
Hunyuan Image 3.0Premium$0.0800
Runway Gen-4 ImagePremium$0.0800
GPT Image 1.5Premium$0.1330
Nano Banana ProPremium$0.1380
Nano Banana 2Premium$0.0670
Grok Imagine ImageStandard$0.0200

Find the best model for your prompt

VibeDex analyzes your prompt and recommends the best AI image model based on what your specific image demands.