DeepSeek V4 at $0.28/M — what 1T parameters means for cost
DeepSeek V4 costs $0.28 per million input tokens. Claude Opus costs $5. That’s an 18x price difference. But is V4 actually 18x worse? Let’s do the math.
Cost per task: real workloads
Assumptions: average task uses 10K input + 3K output tokens.
| Model | Input cost | Output cost | Total per task | 10,000 tasks |
|---|---|---|---|---|
| Claude Opus 4.6 | $0.050 | $0.075 | $0.125 | $1,250 |
| GPT-5.4 | $0.025 | $0.030 | $0.055 | $550 |
| Claude Sonnet 4.6 | $0.030 | $0.045 | $0.075 | $750 |
| Gemini 3.1 Pro | $0.020 | $0.036 | $0.056 | $560 |
| DeepSeek V4 | $0.003 | $0.003 | $0.006 | $60 |
| DeepSeek V4 Lite | $0.001 | $0.002 | $0.003 | $30 |
DeepSeek V4 is 21x cheaper than Claude Opus per task. At 10,000 tasks, you save $1,190.
But what do you lose?
| Benchmark | DeepSeek V4 | Claude Opus | Gap |
|---|---|---|---|
| SWE-bench Verified | 72.5% | 80.8% | -8.3 pts |
| GPQA Diamond | 84.0% | 94.3% | -10.3 pts |
| LM Arena Elo | 1445 | 1504 | -59 pts |
V4 trails Opus by 8-10 points on major benchmarks. That’s significant for frontier tasks (novel reasoning, PhD-level science, complex code architecture) but often invisible for routine work (summarization, data extraction, classification, format conversion).
The 80/20 rule for model selection
Use DeepSeek V4 when:
- Task is well-defined (extraction, classification, summarization)
- You’re processing high volume (thousands of items)
- Quality difference between 84% and 94% doesn’t matter for your use case
- Cost is a constraint
Use frontier models (Opus, GPT-5.4, Gemini 3.1) when:
- Task requires novel reasoning or creativity
- Errors are expensive (security review, medical, legal)
- You need the best available quality regardless of cost
The smart approach: Use DeepSeek V4 for data gathering and preprocessing, then pass the structured results to Opus for analysis and synthesis. This is exactly what multi-model agent pipelines are designed for.
Context window trade-off
DeepSeek V4’s 128K context is much smaller than Claude’s 1M or Llama’s 10M. For document analysis, code review of large repos, or conversation with long history, this is a real limitation.
Self-hosting economics
V4 is open-weight, so you can self-host for zero marginal token cost. But 1T parameters requires a serious GPU cluster. Rough estimate: 8x H100 for inference, ~$25K/month in cloud GPU costs. Only worth it at extremely high volume (millions of requests/month).