GPT-5.4 Mini and Nano Arrive, but Higher Prices Leave a Bitter Taste

Editor J Mar 18, 2026

OpenAI unveiled GPT-5.4 Mini and Nano as its 'most powerful small models.' Mini reaches 94% of the full model's SWE-Bench Pro score, but pricing up to 4x higher than predecessors and long-context performance gaps are sparking debate.

On March 17, 2026, OpenAI unveiled GPT-5.4 Mini and GPT-5.4 Nano. Marketed as the company's "most powerful small models," both support 400K-token context windows and text/image multimodal input, posting benchmark numbers that approach the full-size GPT-5.4. But the pricing tells a different story: input costs have jumped up to 4x compared to the previous GPT-5 mini.

Small Models Reaching 94% of Full Model Performance

GPT-5.4 Mini Nano benchmark comparison chart — Benchmark comparison of GPT-5.4 Mini/Nano vs full model and GPT-5 mini

GPT-5.4 Mini scored 54.4% on SWE-Bench Pro, reaching 94% of the full GPT-5.4's 57.7%. Compared to GPT-5 mini's 45.7%, this represents a generational leap.

The core benchmark numbers highlight Mini's impressive gains. On GPQA Diamond, which measures advanced scientific reasoning, Mini scored 88.0%, narrowing the gap with the full model (93.0%) to just 5 percentage points. On Terminal-Bench 2.0, it posted 60.0%, surging 21.8 points ahead of GPT-5 mini's 38.2%. In Toolathlon, which evaluates tool-use capability, Mini reached 42.9%, well beyond GPT-5 mini's 26.9%.

Nano delivers surprising numbers for its price tier as well. With 52.4% on SWE-Bench Pro and 82.8% on GPQA Diamond, it outperforms previous-generation mid-tier models. The fact that OpenAI's cheapest model ever exceeds earlier mini-class performance underscores how rapidly small models are evolving.

Performance Gains Shadowed by Higher Prices

The sticking point is pricing. GPT-5.4 Mini's API costs $0.75 per million input tokens and $4.50 per million output tokens. Cached input drops to $0.075, but compared to GPT-5 mini, input costs have tripled and output costs are 2.25x higher. Nano sits at $0.20 input and $1.25 output, representing 4x and 3.125x increases respectively over its predecessor.

Small AI model pricing comparison (March 2026)

Model	Input/1M tokens	Cached	Output/1M tokens	vs Previous
GPT-5.4 Mini	$0.75	$0.075	$4.50	3x input, 2.25x output
GPT-5.4 Nano	$0.20	$0.02	$1.25	4x input, 3.125x output
Gemini 3.1 Flash-Lite	$0.25	-	$1.50	-
Claude 4.5 Haiku	Higher	-	Higher	-

Nano remains cheaper than Gemini 3.1 Flash-Lite ($0.25 input, $1.50 output), maintaining its position in the lowest-cost tier. But the disconnect between the market's expectation that small models should be cheap and the reality of up to 4x price increases has sparked heated debate. Whether the performance gains justify the price hikes ultimately depends on each developer's usage patterns.

Developer Community: Between Praise and Skepticism

Reactions from Hacker News and X (formerly Twitter) are split. Some developers praised improved step-by-step reasoning in coding tasks, while others pointed out persistent shortcomings in frontend code quality and external tool integration. Questions about whether real-world production coding performance matches the benchmark numbers surfaced repeatedly.

Pushback against the price increases is considerable. The representative criticism: "If costs go up alongside performance, what's the point of small models?" The counterargument notes that per-performance costs remain far below the full model, and usage patterns with high cache hit rates can drive actual costs down significantly.

A New Phase in the Small Model War

The launch of GPT-5.4 Mini and Nano escalates competition in the small AI model market. Google's Gemini 3.1 Flash-Lite leads in throughput and speed, Anthropic's Claude 4.5 Haiku specializes in agent workflows, and now OpenAI plays the near-full-model performance card.

Yet the title of 'most powerful small model' does not automatically mean 'most suitable small model.' Long-context performance degradation, up to 4x price increases over predecessors, and Nano's API-only availability are variables developers must weigh before choosing. The same question that dominates the frontier model market repeats itself in the small model space: which model best fits my workload? The lesson that topping benchmarks does not equal being the best choice applies here just as much.