GPT-5.4
Benchmark scores
92 GPQA Diamond
83 GDPval
1484 LM Arena Elo
Available via: APIChatBatchAssistants
GPT-5.4, released March 5 2026, delivered record benchmark scores particularly in computer-use tasks (OSWorld-Verified, WebArena Verified) and the 83% GDPval record.
Benchmarks
| Benchmark | Score | Notes |
|---|---|---|
| GPQA Diamond | 92.0% | #2, behind Claude Opus 4.6 |
| GDPval | 83.0% | Record score |
| LM Arena Elo | 1484 | #4 (Standard), #2 with High variant |
| OSWorld-Verified | Record | Computer-use benchmark |
| WebArena Verified | Record | Web navigation benchmark |
Pricing
| Per 1M tokens | |
|---|---|
| Input | $2.50 |
| Output | $10.00 |
Competitively priced — cheaper than Claude Opus for comparable frontier performance.
Variants
| Variant | Use case |
|---|---|
| GPT-5.4 Standard | General-purpose |
| GPT-5.4 Thinking | Extended reasoning with chain-of-thought |
| GPT-5.4 Pro | Maximum quality, higher cost |
Strengths
- Best computer-use model available (OSWorld, WebArena records)
- Strong price/performance — $2.50/$10 undercuts Opus pricing
- GDPval record suggests strong real-world task completion
Weaknesses
- Trails Claude Opus on GPQA Diamond by 2.3 points
- 256K context window vs Claude’s 1M
- OpenAI stopped reporting SWE-bench Verified scores (data contamination concerns)
What’s next
GPT-5.5 (codenamed “Spud”) has completed pretraining. Another major release expected soon.