DeepSeek V4-Pro's 75% Price Cut Is Now Permanent

DeepSeek confirmed on May 22, 2026, that the promotional 75% discount on V4-Pro is no longer temporary. It is the new baseline. The promo was scheduled to expire May 31. Instead, it became the standard rate before that date arrived.

That is a meaningful signal. Price anchoring matters in API markets, and DeepSeek is anchoring hard at the low end.

What the Numbers Actually Look Like

The new permanent rates for DeepSeek V4-Pro:

Token type	Old price	New price
Input	$1.74 / 1M	$0.435 / 1M
Output	$3.48 / 1M	$0.87 / 1M
Cache hit (input)	$0.145 / 1M	$0.003625 / 1M

That cache hit number deserves its own moment. $0.003625 per million input tokens for cache hits. If your system prompt is long and static, you are paying almost nothing for it on repeat calls. For agents with large tool-call schemas baked into the context, this is where the real savings stack up.

For comparison, Claude Opus 4.7 output tokens cost in the range of $75 per million via API. V4-Pro output is now $0.87. The gap is roughly 86x on output alone. Even if you find V4-Pro meaningfully weaker for your specific workload, the economics create an uncomfortable question: how much capability gap can you afford to pay for?

What V4-Pro Actually Is

Released April 24, 2026, V4-Pro is a 1.6 trillion parameter Mixture-of-Experts model. Only 49 billion parameters activate per forward pass, which keeps inference cost closer to a 49B dense model despite the much larger knowledge capacity.

The architecture changed from V3. V4 introduces Compressed Sparse Attention (CSA) combined with Heavily Compressed Attention (HCA), which cuts per-token FLOPs to roughly 27% of V3.2 and reduces KV-cache memory to about 10% of V3.2. That 1M-token context window is usable in practice, not just a spec sheet number.

Training used the Muon optimizer instead of AdamW for most parameters. DeepSeek reports faster convergence and more stable training at this scale.

Weights are on Hugging Face under the MIT license. Commercial use, fine-tuning, self-hosting, all included.

Where It Performs and Where It Does Not

On coding benchmarks, V4-Pro is genuinely competitive. It scores 80.6% on SWE-bench Verified, 93.5 on LiveCodeBench, and 67.9% on Terminal-Bench 2.0. That Terminal-Bench number beats Claude Opus 4.6's 65.4%, and Terminal-Bench involves actual autonomous terminal execution with a 3-hour timeout window. That is a more realistic signal for agentic coding than most single-turn benchmarks.

Where it falls short: HLE (Humanity's Last Exam) at 37.7% versus Claude's 40% and GPT-5.4's 39.8%. SimpleQA-Verified at 57.9% versus Gemini's 75.6% shows a clear factual knowledge retrieval gap. If your use case depends on accurate real-world knowledge recall rather than code generation or reasoning chains, those gaps are worth taking seriously.

The honest framing: V4-Pro is a strong coding and reasoning model. It is not uniformly better than closed frontier models. It is better than them on price by a large margin and better than them on several coding benchmarks specifically. If you are building a coding agent or an agentic system with structured tool calls, the price-performance math makes it worth serious evaluation.

Why Agentic Workloads Specifically

DeepSeek explicitly designed V4 around agentic use. The architecture supports long-context persistence across tool rounds, and the post-training pipeline was shaped around tool-heavy trajectories rather than static Q&A.

This matters for a practical reason. Agentic loops burn tokens fast. A multi-step planning agent that calls five tools, synthesizes results, and iterates can consume hundreds of thousands of tokens per session. At $3.48 per million output tokens, that adds up quickly. At $0.87, the same loop costs a quarter of that. The economics of running 24/7 autonomous agents shift substantially.

The 90% cache hit discount compounds this further. If your agents share a large system context across sessions, caching that context and paying $0.003625 per million on cache hits instead of $0.435 changes the per-session cost structure dramatically.

Self-Hosting Is a Real Option

V4-Pro at 1.6T total parameters is not a laptop deployment. You need serious multi-GPU infrastructure. V4-Flash (284B total, 13B active) is more accessible and still scores 79% on SWE-bench Verified with $0.28 per million output tokens via the API.

The open MIT weights mean you can take either model off the API entirely. For teams with data privacy requirements or compliance constraints, self-hosting is actually viable in a way that closed models from OpenAI or Anthropic cannot offer. The trade-off is operational overhead, hardware cost, and the fact that you lose DeepSeek's infrastructure guarantees.

The Practical Decision

If you are not already running V4-Pro in your eval suite, run it this week. The pricing is no longer a promotional reason to test it. It is the permanent economics you are comparing against.

For high-volume coding agents and structured reasoning pipelines, V4-Pro is the default cost-efficient option right now. For workloads requiring strong factual recall, long-horizon agentic reliability, or multimodal capability, the closed frontier models still hold structural advantages that justify the cost.

The legacy deepseek-chat and deepseek-reasoner API aliases retire July 24, 2026. If you are on those endpoints, migration to deepseek-v4-pro or deepseek-v4-flash is on the near-term list regardless of this pricing news.

That is a meaningful signal. Price anchoring matters in API markets, and DeepSeek is anchoring hard at the low end.

What the Numbers Actually Look Like

The new permanent rates for DeepSeek V4-Pro:

Token type	Old price	New price
Input	$1.74 / 1M	$0.435 / 1M
Output	$3.48 / 1M	$0.87 / 1M
Cache hit (input)	$0.145 / 1M	$0.003625 / 1M

What V4-Pro Actually Is

Training used the Muon optimizer instead of AdamW for most parameters. DeepSeek reports faster convergence and more stable training at this scale.

Weights are on Hugging Face under the MIT license. Commercial use, fine-tuning, self-hosting, all included.

Where It Performs and Where It Does Not

Why Agentic Workloads Specifically

Self-Hosting Is a Real Option

The Practical Decision

If you are not already running V4-Pro in your eval suite, run it this week. The pricing is no longer a promotional reason to test it. It is the permanent economics you are comparing against.

DeepSeek V4-Pro's 75% Price Cut Is Now Permanent

What the Numbers Actually Look Like

What V4-Pro Actually Is

Where It Performs and Where It Does Not

Why Agentic Workloads Specifically

Self-Hosting Is a Real Option

The Practical Decision

Arbind Singh

Comments

Leave a comment

Kubernetes vs Docker: Stop Comparing the Wrong Things

Claude Code Free Unlimited Setup with OpenCode Zen and Minimax M2.5

Google Released Gemma 4 for Free. Here Is Why That Makes Sense.

DeepSeek V4-Pro's 75% Price Cut Is Now Permanent

What the Numbers Actually Look Like

What V4-Pro Actually Is

Where It Performs and Where It Does Not

Why Agentic Workloads Specifically

Self-Hosting Is a Real Option

The Practical Decision

Arbind Singh

Comments

Leave a comment

Kubernetes vs Docker: Stop Comparing the Wrong Things

Claude Code Free Unlimited Setup with OpenCode Zen and Minimax M2.5

Google Released Gemma 4 for Free. Here Is Why That Makes Sense.