open source Meta

Llama 4 Maverick

400B total, 17B active (MoE) January 31, 2026

Context 10M tokens

Max output 64K tokens

Architecture Mixture-of-Experts, 400B total, 17B active per query

Pricing (per 1M tokens) Free

Benchmark scores

68.5 SWE-bench Verified

78 GPQA Diamond

Available via: Self-hostedOpenRouterTogether AIFireworks

Llama 4 Maverick pushes the open-source frontier with a 10M token context window and competitive benchmark scores at zero licensing cost.

Benchmarks

Benchmark	Score	Notes
SWE-bench Verified	~68.5%	Strong for open-weight
GPQA Diamond	~78.0%	Approaching closed-source models

Pricing

Self-hosted: Free (open weights, permissive license). Hosted providers vary:

Provider	Input/Output (per 1M)
Together AI	~$0.80/$0.80
OpenRouter	~$0.50/$0.50
Fireworks	~$0.60/$0.60

Architecture

Mixture-of-Experts with 400B total parameters but only 17B active per query. This means:

Inference speed comparable to a 17B dense model
Quality approaching a 400B dense model
Dramatically lower GPU requirements than parameter count suggests

Llama 4 family

Model	Params	Active	Context	Use case
Scout	109B	17B	10M	Efficient, long-context
Maverick	400B	17B	10M	Quality-focused

Strengths

10M context window — 10x larger than most competitors
Zero cost for self-hosted deployment
MoE architecture keeps inference fast despite 400B params
Open weights enable fine-tuning and customization

Weaknesses

Benchmark scores trail frontier closed-source models by 10-15 points
Requires significant GPU resources for self-hosting (multiple A100s/H100s)
No official hosted API from Meta