Gemini 3.1 Pro
Benchmark scores
78.8 SWE-bench Verified
90.8 GPQA Diamond
77.1 ARC-AGI-2
1493 LM Arena Elo
Available via: APIChatBatchVertex AI
Gemini 3.1 Pro is the strongest all-around model by multiple independent benchmarks — it leads or places top-3 on nearly every major evaluation.
Benchmarks
| Benchmark | Score | Rank |
|---|---|---|
| SWE-bench Verified | 78.8% | Leading (among GA models) |
| GPQA Diamond | 90.8% | #4 |
| ARC-AGI-2 | 77.1% | #1 |
| LM Arena Elo | 1493 | #2 |
Pricing
| Per 1M tokens | |
|---|---|
| Input | $2.00 |
| Output | $12.00 |
Cheaper than both Claude Opus and GPT-5.4 at the input tier.
Variants
| Model | Input/Output (per 1M) | Use case |
|---|---|---|
| Gemini 3.1 Pro | $2.00/$12.00 | Flagship reasoning |
| Gemini 3.1 Flash | $0.50/$3.00 | Fast, balanced |
| Gemini 3.1 Flash-Lite | $0.10/$0.40 | Ultra-cheap, high volume |
| Gemini 3.1 Ultra | Premium | Native multimodal reasoning |
Strengths
- Best price among frontier models ($2/$12)
- 2M context window — largest of any frontier model
- Native multimodal (Ultra variant) — not bolted-on vision
- ARC-AGI-2 leader — strongest abstract reasoning
Weaknesses
- GPQA trails Claude Opus by 3.5 points
- Google’s API ecosystem (Vertex AI) adds complexity vs simpler APIs
- Ultra pricing not yet publicly available
Architecture
Native multimodal from the ground up — processes text, images, audio, and video in a single model rather than routing through separate encoders. The 2M context window makes it especially strong for large-document analysis.