insidejob
open source Meta

Llama 4 Maverick

400B total, 17B active (MoE)
Context 10M tokens
Max output 64K tokens
Architecture Mixture-of-Experts, 400B total, 17B active per query
Pricing (per 1M tokens) Free

Benchmark scores

68.5 SWE-bench Verified
78 GPQA Diamond
Available via: Self-hostedOpenRouterTogether AIFireworks

Llama 4 Maverick pushes the open-source frontier with a 10M token context window and competitive benchmark scores at zero licensing cost.

Benchmarks

BenchmarkScoreNotes
SWE-bench Verified~68.5%Strong for open-weight
GPQA Diamond~78.0%Approaching closed-source models

Pricing

Self-hosted: Free (open weights, permissive license). Hosted providers vary:

ProviderInput/Output (per 1M)
Together AI~$0.80/$0.80
OpenRouter~$0.50/$0.50
Fireworks~$0.60/$0.60

Architecture

Mixture-of-Experts with 400B total parameters but only 17B active per query. This means:

  • Inference speed comparable to a 17B dense model
  • Quality approaching a 400B dense model
  • Dramatically lower GPU requirements than parameter count suggests

Llama 4 family

ModelParamsActiveContextUse case
Scout109B17B10MEfficient, long-context
Maverick400B17B10MQuality-focused

Strengths

  • 10M context window — 10x larger than most competitors
  • Zero cost for self-hosted deployment
  • MoE architecture keeps inference fast despite 400B params
  • Open weights enable fine-tuning and customization

Weaknesses

  • Benchmark scores trail frontier closed-source models by 10-15 points
  • Requires significant GPU resources for self-hosting (multiple A100s/H100s)
  • No official hosted API from Meta