GPT-5.4 — insidejob

GPT-5.4, released March 5 2026, delivered record benchmark scores particularly in computer-use tasks (OSWorld-Verified, WebArena Verified) and the 83% GDPval record.

Benchmarks

Benchmark	Score	Notes
GPQA Diamond	92.0%	#2, behind Claude Opus 4.6
GDPval	83.0%	Record score
LM Arena Elo	1484	#4 (Standard), #2 with High variant
OSWorld-Verified	Record	Computer-use benchmark
WebArena Verified	Record	Web navigation benchmark

Pricing

	Per 1M tokens
Input	$2.50
Output	$10.00

Competitively priced — cheaper than Claude Opus for comparable frontier performance.

Variants

Variant	Use case
GPT-5.4 Standard	General-purpose
GPT-5.4 Thinking	Extended reasoning with chain-of-thought
GPT-5.4 Pro	Maximum quality, higher cost

Strengths

Best computer-use model available (OSWorld, WebArena records)
Strong price/performance — $2.50/$10 undercuts Opus pricing
GDPval record suggests strong real-world task completion

Weaknesses

Trails Claude Opus on GPQA Diamond by 2.3 points
256K context window vs Claude’s 1M
OpenAI stopped reporting SWE-bench Verified scores (data contamination concerns)

What’s next

GPT-5.5 (codenamed “Spud”) has completed pretraining. Another major release expected soon.