The low-cost tier in non-thinking mode — what beats Gemini 3.1 Flash-Lite?
All models run with reasoning disabled (non-thinking). Eligible = models that support turning thinking off, or running at low/minimal effort (which we treat as the non-thinking equivalent — this keeps gpt-oss in). Cost is effective spend under your workload — 80% input-cache hit rate, 20:1 input/output — and intelligence is measured non-thinking. The frontier below tests one thing: for a given non-thinking intelligence, are open models cheaper than Gemini?
Prepared July 1, 2026 · Prices $/M tokens, on-demand list · Intelligence = Artificial Analysis Intelligence Index. Non-thinking scores shown as estimates alongside published reasoning-mode scores — see the methodology caveat.
Best Flash-Lite alternative
DeepSeek V4 Flash
~2× cheaper ($0.07 vs $0.15/unit) and higher non-thinking intelligence. Cleanly dominates Flash-Lite.
Cost floor
GPT-5 Nano
~$0.03/unit (minimal effort). Nova Lite (~$0.04) is the cheapest true thinking-off option. Both low intelligence.
Frontier owned by
Open models
Nova Lite → V4 Flash → MiniMax M3 → Kimi K2.6 → V4 Pro. Gemini appears only at the very top.
US-origin open option
gpt-oss-120B
At low effort (≈ non-thinking): cheap (~$0.08) but below DeepSeek V4 Flash on intelligence. Edge = fully open + US-origin.
Direct answer: In non-thinking mode, Gemini 3.1 Flash-Lite is Pareto-dominated. DeepSeek V4 Flash is cheaper on every token and scores higher; MiniMax M3 is a rung smarter for a hair more. Gemini's low/mid tier (3.1 Flash-Lite, 3 Flash) sits inside the frontier — no workload makes them the rational pick. Gemini only re-earns the frontier at the top (3.5 Flash).
Methodology caveat — read this. Intelligence here is measured at low effort / non-thinking, not reasoning mode. Artificial Analysis publishes mostly reasoning-mode scores, so the Low-effort intel (est.) column is derived from each model's published reasoning score minus a documented delta (hybrids drop ~30% on this reasoning-heavy index at low effort; gpt-oss uses its reasoning_effort: low setting; Kimi K2.6 and Nova are natively non-thinking, so small/no delta). Relative ordering is robust; treat absolute values as directional and confirm per model on AA.
Running non-thinking actually makes the 20:1 ratio realistic — no reasoning-token blow-up on output, plus lower latency. Cache-read discount depth still dominates: Together / Google / Bedrock-native ≈ 90% off cached input · DeepInfra / Fireworks / Baseten ≈ 50% off · Groq = none.
1 · Non-thinking Pareto frontier — intelligence vs effective cost
Price axis reversed: cheaper is higher. So the ideal corner is top-right (cheap + smart) and the efficient frontier now reads as a convex-up curve. Every model at its cheapest US-available host, effective cost under 80%-cache / 20:1, non-thinking intelligence on the x-axis.
Reading the convex frontier: cheaper is up, so the efficient curve bows toward the top-right. It runs GPT-5 Nano → DeepSeek V4 Flash → MiniMax M3 → Kimi K2.6 → DeepSeek V4 Pro → Gemini 3.5 Flash. The whole mid-frontier is open-weight; only the cheap floor (GPT-5 Nano; Nova Lite is the cheapest true thinking-off) and the intelligence ceiling (Gemini 3.5 Flash) are proprietary. Gemini 3.1 Flash-Lite, Gemini 3 Flash, and even the cheapest Claude (Haiku 4.5) fall below the curve — each dominated by an open model that's both cheaper and smarter.
Apples-to-oranges guard (● vs ◯): filled points run truly thinking-off (open hybrids + native Nova/Kimi). Hollow points can only reach a low/minimal-effort floor — gpt-oss (reasoning_effort: low) and all Gemini 3.x (thinking_level: minimal, no true zero). So Gemini's points still carry some residual reasoning — if anything that flatters Gemini here, and it still doesn't reach the frontier.
2 · The numbers (non-thinking)
Model
Type
Reason. (pub.)
Low-effort intel (est.)
Input
Cached in
Output
Eff. $/unit
Frontier?
GPT-5 Nano min
Prop
~22
~16
$0.05
$0.005
$0.40
$0.03
✓ cost floor
Amazon Nova Lite off
Prop
n/a
~12
$0.06
$0.015
$0.24
$0.04
✓ cheapest true-off
gpt-oss-20B low eff
OW
~15
~11
$0.05
$0.025
$0.20
$0.04
near floor
DeepSeek V4 Flash off
OW
47
~29
$0.14
$0.03
$0.28
$0.07
✓ ★ best value
gpt-oss-120B low eff
OW
24
~18
$0.09
$0.045
$0.45
$0.08
dominated
Gemini 3.1 Flash-Lite min
Gemini
25
~22
$0.25
$0.025
$1.50
$0.145
dominated
MiniMax M3 off
OW
~44
~30
$0.30
$0.06
$1.20
$0.17
✓ frontier
Gemini 3 Flash min
Gemini
27
~19
$0.50
$0.05
$3.00
$0.29
dominated
Qwen 3.6 Max off
OW
40
~27
~$0.60
$0.06
$3.00
$0.32
dominated
Amazon Nova Pro off
Prop
n/a
~20
$0.80
$0.08
$3.20
$0.38
dominated
Kimi K2.6 offnative
OW
54
~33
$0.75
$0.125
$3.50
$0.43
✓ frontier
GLM-5 off
OW
~40
~27
$1.05
$0.105
$3.50
$0.47
dominated
Claude Haiku 4.5 off
Prop
~44
~30
$1.00
$0.10
$5.00
$0.53
dominated
DeepSeek V4 Pro off
OW
52
~34
$2.10
$0.20
$4.40
$0.80
✓ frontier
Gemini 3.5 Flash min
Gemini
50
~35
$1.50
$0.15
$9.00
$0.87
✓ intel. leader
Gemini 3.1 Pro* min
Gemini
~46
~33
~$1.50
$0.15
~$12.00
~$0.95
dominated
Reasoning scores are published (default/max-effort per AA). Non-thinking = estimate (see caveat). Eff. cost uses each model's cheapest US-available host and that host's cache discount; DeepSeek V4 Pro is also available first-party at ~$0.44/$0.87 (eff ~$0.22) but that routes data to DeepSeek's servers — a residency trade-off. Kimi K2.6 & Nova are natively non-thinking; Claude Haiku 4.5 is non-thinking by default (extended thinking is opt-in); GPT-5 Nano runs at minimal effort (not fully off). Claude/GPT non-thinking scores are estimates. *Gemini 3.1 Pro output price estimated.
This is the crux of your question. Flash-Lite (non-thinking): intelligence ~22, ~$0.145/unit. Only two eligible models are strictly cheaper AND at least as smart; a third is a hair more for a real intelligence jump.
Model
Non-think intel
Eff. $/unit
vs Flash-Lite
DeepSeek V4 Flash OWoff
~29 (higher)
$0.07
2.1× cheaper + smarter — the answer ★
Amazon Nova Lite Propoff
~12 (lower)
$0.04
3.6× cheaper but a clear step down in intelligence; AWS-native
MiniMax M3 OWoff
~30 (higher)
$0.17
~same price, meaningfully smarter — the "step up" pick
Everything else eligible is either dominated by DeepSeek V4 Flash (Gemini 3 Flash, Qwen 3.6, GLM-5, Nova Pro, gpt-oss-120B) or a paid step up in intelligence (Kimi K2.6 $0.43, DeepSeek V4 Pro $0.80). gpt-oss-120B/20B (run at low effort ≈ non-thinking) come in cheap but land below DeepSeek V4 Flash on low-effort intelligence — their real edge is being fully open-weight and US-origin, if Chinese-model provenance is a concern for you.
4 · Where to host each Pareto-frontier model
The six frontier models and their hosting options ($/M, input / cached-in / output). Cheapest US-available host is highlighted. The open-weight models are cheapest on their own first-party API, but those route data to China — the US hosts (DeepInfra, Together, Fireworks, Baseten) cost more and keep data out. Nova / GPT-5 Nano / Gemini are single-vendor.
Model
Host
Input
Cached in
Output
Notes
GPT-5 NanoProp
OpenAI API
$0.05
$0.005
$0.40
single vendor; 90% cache
Azure OpenAI
$0.05
$0.005
$0.40
enterprise / compliance
DeepSeek V4 FlashOW
DeepInfra US
$0.14
$0.07
$0.28
cheapest US host; 50% cache
DeepSeek API / OpenRouter
$0.14
$0.003
$0.28
cheapest overall; data→China
Together US
~$0.35
~$0.035
~$0.50
~90% cache (best for cache-heavy)
Fireworks US
~$0.40
~$0.20
~$0.55
high-RPM SLAs
MiniMax M3OW
DeepInfra US
~$0.30
$0.15
~$1.20
cheapest US host
MiniMax API
$0.30
—
$1.20
cheapest overall; data→China
Together US
~$0.35
$0.06
~$1.30
deep cache
Kimi K2.6OW
DeepInfra (FP4) US
$0.75
$0.375
$3.50
cheapest US host; ~$1.44 blended
OpenRouter
$0.55
—
$3.20
cheapest per-token (routed)
Moonshot API
$0.95
—
$4.00
first-party; data→China
Together / Fireworks / Parasail US
~$1.15–1.71 blended
also available
DeepSeek V4 ProOW
DeepInfra US
$1.30
$0.65
$2.60
cheapest US host; 50% cache
DeepSeek API
$0.44
~$0.04
$0.87
cheapest overall; data→China
Together US
$2.10
$0.20
$4.40
~90% cache — best cache-heavy
Baseten US
$1.74
~$0.87
$3.48
dedicated deployments
Nova LiteProp
AWS Bedrock (only)
$0.06
$0.015
$0.24
AWS-native; 90% cache
Gemini 3.5 FlashGemini
Google AI Studio
$1.50
$0.15
$9.00
90% cache
Vertex AI
$1.50
$0.15
$9.00
enterprise / compliance
Host takeaway: for the open-weight frontier models, the model's own first-party API is always cheapest — but sends data to China. For US data-residency, DeepInfra is the cheapest US host (50% cache); Together wins on your cache-heavy 80%-hit workload (~90% cache read) despite higher sticker; Fireworks/Baseten for SLAs/dedicated. Nova is Bedrock-only, GPT-5 Nano is OpenAI/Azure, Gemini is Google/Vertex.
5 · What to do
Replace Gemini 3.1 Flash-Lite with DeepSeek V4 Flash. Non-thinking, ~2× cheaper effective ($0.07 vs $0.145), and higher intelligence. It's the single clearest win in this analysis.
Want the absolute floor and don't need the intelligence? Nova Lite (~$0.04) — natively non-thinking, AWS-native, zero compliance friction. Good for bulk/simple classification where ~12 intelligence suffices.
Need one notch more capability? MiniMax M3 (~$0.17) for roughly Flash-Lite's price, or step to Kimi K2.6 ($0.43, natively non-thinking) for the strongest non-thinking open model in the mid band.
Need US-origin / non-Chinese open weights?gpt-oss-120B at low effort (~$0.08) is the pick — it's dominated on raw intelligence by DeepSeek V4 Flash, but it's fully open, self-hostable, and avoids Chinese-model provenance concerns.
Reserve Gemini 3.5 Flash for the genuine top of the band — it leads non-thinking intelligence but costs ~12× DeepSeek V4 Flash, driven by its $9 output.
Cheapest possible tokens? GPT-5 Nano (~$0.03) at minimal effort is the floor — below Nova Lite — but at low intelligence (~16). Fine for trivial routing/extraction; use Nova Lite if you need a genuinely thinking-off model.
Don't reach for the cheapest Claude or GPT for this workload. Claude Haiku 4.5 (~$0.53) is dominated — MiniMax M3 matches its intelligence at ~⅓ the cost, and Kimi K2.6 beats it on both. Claude/GPT premium buys reliability and ecosystem, not price-for-intelligence here.
Host = price lever. Put open models on DeepInfra (cheapest tokens) or Together (deepest ~90% cache discount — wins on your cache-heavy workload). Keep Bedrock for Nova (its cache discount doesn't extend to third-party open models).
Bottom line: With non-thinking mode enforced, open models still win the price-for-intelligence frontier everywhere except the extreme top. Gemini 3.1 Flash-Lite is not on the frontier — DeepSeek V4 Flash beats it on both axes. You'd only stay on Gemini for managed-API reliability, multimodality, or data-residency reasons, not for cost or non-thinking intelligence.
Caveats. Non-thinking Intelligence Index values are estimates derived from published reasoning scores (hybrids ≈ −30%; Kimi/Nova native non-thinking) — AA publishes mostly reasoning-mode; verify per model. Qwen 3.6 Max input, Gemini 3.1 Pro output, and MiniMax M3 cache rate are approximate. Prices are on-demand list and move fast — confirm on live pages before committing.