Is DeepSeek V4 better than GPT-5.4 or Gemini 3.1-Pro?

V4 Pro Max claims superior benchmark performance over GPT-5.2 and Gemini 3.0-Pro, falling slightly short of GPT-5.4 and Gemini 3.1-Pro. That ranking will shift with every new release. The more important fact is that a free, open-source model is operating in the same performance range as systems that cost enterprises serious money to access.

Should my company use DeepSeek V4 instead of OpenAI or Anthropic?

Not automatically, and not without evaluation. The right move is to test V4 against your actual use cases and measure results. 'Free and close enough' is a real strategic option for many workloads. But you also need to weigh geopolitical risk, data handling, and your team's capacity to work with an open-source model.

What are the business risks of relying on DeepSeek or other Chinese AI models?

Three categories: geopolitical disruption (export controls, diplomatic tensions, or regulatory action could affect availability), data handling (open-source does not mean your data is automatically safe), and supply chain uncertainty (the chip accusations suggest the model's training environment is contested). These are planning inputs, not reasons to panic.

What does the 1 million token context window in DeepSeek V4 actually change?

It moves V4 into a different category of use case. A 128K context window handles long documents. A 1 million token window handles entire codebases, large datasets, or extended multi-step reasoning tasks. That is not incremental; it opens up workloads that were previously impractical with most models.

What is AI model distillation and why does it matter for businesses?

Distillation is training a smaller or cheaper model on the outputs of a more powerful one, effectively transferring capability at low cost. The accusations against DeepSeek matter because if true, they mean the competitive gap between U.S. and Chinese AI labs may be narrower than the underlying R&D investment would suggest. For businesses, it signals that the AI competitive space can shift faster than most vendor roadmaps will tell you.

How should business leaders respond to rapid changes in the AI model market?

Three concrete steps: audit your current AI vendor dependencies and identify where you are locked in, test open-source alternatives against your specific workloads rather than relying on benchmark rankings, and factor geopolitical and supply chain variables into your technology planning the same way you would any other business risk. If you do not have internal judgment to do that well, build it or get outside help.

Is DeepSeek V4 better than GPT-5.4 or Gemini 3.1-Pro?

V4 Pro Max claims superior benchmark performance over GPT-5.2 and Gemini 3.0-Pro, falling slightly short of GPT-5.4 and Gemini 3.1-Pro. That ranking will shift with every new release. The more important fact is that a free, open-source model is operating in the same performance range as systems that cost enterprises serious money to access.

Should my company use DeepSeek V4 instead of OpenAI or Anthropic?

Not automatically, and not without evaluation. The right move is to test V4 against your actual use cases and measure results. 'Free and close enough' is a real strategic option for many workloads. But you also need to weigh geopolitical risk, data handling, and your team's capacity to work with an open-source model.

What are the business risks of relying on DeepSeek or other Chinese AI models?

Three categories: geopolitical disruption (export controls, diplomatic tensions, or regulatory action could affect availability), data handling (open-source does not mean your data is automatically safe), and supply chain uncertainty (the chip accusations suggest the model's training environment is contested). These are planning inputs, not reasons to panic.

What does the 1 million token context window in DeepSeek V4 actually change?

It moves V4 into a different category of use case. A 128K context window handles long documents. A 1 million token window handles entire codebases, large datasets, or extended multi-step reasoning tasks. That is not incremental; it opens up workloads that were previously impractical with most models.

What is AI model distillation and why does it matter for businesses?

Distillation is training a smaller or cheaper model on the outputs of a more powerful one, effectively transferring capability at low cost. The accusations against DeepSeek matter because if true, they mean the competitive gap between U.S. and Chinese AI labs may be narrower than the underlying R&D investment would suggest. For businesses, it signals that the AI competitive space can shift faster than most vendor roadmaps will tell you.

How should business leaders respond to rapid changes in the AI model market?

Three concrete steps: audit your current AI vendor dependencies and identify where you are locked in, test open-source alternatives against your specific workloads rather than relying on benchmark rankings, and factor geopolitical and supply chain variables into your technology planning the same way you would any other business risk. If you do not have internal judgment to do that well, build it or get outside help.

When Free AI Gets Good Enough to Hurt You

Josef Holm6 min readApril 24, 2026

Key Takeaways

DeepSeek V4 performs near the top proprietary models at zero cost; the benchmark gap is narrowing and the price gap is not.
The real question is not which model wins; it is whether your business has dependencies you cannot control when the market restructures.
Distillation accusations and chip export disputes are not background noise; they are variables that affect which AI tools you can rely on long-term.
Open-source AI at frontier performance means your competitors, including ones you have not spotted yet, can now build on the same tools you use.
Audit your vendor dependencies, test V4 against your actual use cases, and build internal judgment about which models to use where and why.

What actually happened with DeepSeek V4?

Most companies will spend the next week debating whether DeepSeek V4 is "as good as" GPT-5.4 or Gemini 3.1-Pro. Wrong question. The right question is what happens to your business when a free, open-source model from China performs within striking distance of the most expensive proprietary systems on the planet.

DeepSeek, the Hangzhou-based startup that rattled global markets in January 2025 with its R1 reasoning model, released preview versions of its V4 update on Friday. Two versions: a "pro" and a lighter "flash."

The headline numbers matter. DeepSeek claims V4 Pro Max shows superior performance on standard reasoning benchmarks relative to OpenAI's GPT-5.2 and Google's Gemini 3.0-Pro, falling only marginally short of GPT-5.4 and Gemini 3.1-Pro. On agentic capabilities, it claims to outperform Claude's Sonnet 4.5 and approach Claude's Opus 4.5.

Both versions support a 1 million token context window, up from 128,000 in V3. That's not incremental. That's a different category of capability entirely.

And it's open-source. Free to use, free to modify, free to build on.

Lian Jye Su, chief analyst at Omdia, put it plainly: "Based on the benchmark results, it does appear DeepSeek V4 is going to be very competitive against its U.S. rivals."

Why does this matter beyond the benchmarks?

I've been through enough technology cycles to know the benchmarks are never the real story. What matters is what shifts in the competitive structure underneath.

Here's what's shifting. The cost floor for world-class AI just dropped again. DeepSeek has been doing this since it first appeared: matching or approaching frontier performance at a fraction of the cost. V4 continues that pattern. When a free model gets within striking distance of systems that cost enterprises serious money to access, it changes the math for every company making AI investment decisions.

Does that mean DeepSeek V4 replaces your current stack? No. Does it mean you need to understand what "good enough at zero cost" does to your competitive position? Yes, it does.

Some analysts are cautious. Ivan Su at Morningstar called V4 a "competent" follow-up but noted it's not the same kind of breakthrough as R1. Fair point. Competent follow-ups that maintain cost advantages compound over time, though. That's how market structures actually change.

What about the distillation accusations?

This is the part most business leaders are ignoring. They shouldn't be.

Anthropic accused DeepSeek and two other China-based labs of running "industrial-scale campaigns" to extract Claude's capabilities through distillation, a technique where you train a less capable model on the outputs of a stronger one. OpenAI made similar claims to U.S. lawmakers. This week, Michael Kratsios, Trump's chief science and technology adviser, accused foreign tech companies "principally based in China" of distilling leading U.S. AI systems.

China's embassy in Washington called the allegations "unjustified suppression of Chinese companies by the U.S."

I'm not going to adjudicate who's right. What I will say is this: the existence of these accusations tells you something real about the state of play. The lines between competition and extraction are blurring. If you're building your business on top of any AI platform, you need to understand that the technology you're depending on exists inside a geopolitical contest that is intensifying, not cooling down.

That's not a reason to panic. It's a reason to think clearly about dependency.

Should companies care about the Huawei chip angle?

Yes, but probably not for the reason you think.

Reports suggest DeepSeek improved V4 for Huawei's Ascend chips, while officials accused the company of using banned Nvidia Blackwell hardware prior to release. The specifics of who used what chip will get sorted out, or they won't. What matters for business leaders is the implication.

If DeepSeek can train competitive models on non-Nvidia hardware, U.S. chip export controls lose some of their strategic force. If they can't, and the Nvidia allegations hold, then the controls matter more than ever. Either way, the hardware supply chain underneath AI is becoming a strategic variable that affects model availability, pricing, and reliability.

Most companies don't think about this. They should. Your AI vendor's chip supply chain is now a business risk factor. That's genuinely new territory.

What does "open-source" actually mean here?

DeepSeek describes its technology as open-source, enabling developers to access, modify, and build on its core technology. That's a different posture than Anthropic, Google, and OpenAI, which keep their top models proprietary.

Open-source AI at this performance level creates a specific dynamic. Smaller companies and developers in markets around the world can build on frontier-class technology without paying frontier-class prices. A Microsoft report from January showed DeepSeek gaining ground in many developing nations. V4 will accelerate that.

For established enterprises, this creates both opportunity and pressure. Opportunity because you can experiment with powerful models at low cost. Pressure because your competitors can too, including ones you haven't identified yet, in markets you might not be watching closely.

How should business leaders actually think about this?

Here's the framework I'd use.

Don't get caught in the benchmark horse race. Whether V4 beats GPT-5.2 by 3% on some reasoning test doesn't change your strategy. What changes your strategy is the overall trend: the gap between the best proprietary models and the best open-source models is narrowing, and the cost difference remains enormous.

Audit your AI dependencies. If you're locked into a single vendor's system, you're exposed to pricing changes, geopolitical disruptions, and capability shifts you can't control. Competitive open-source alternatives give you options, but only if you've built the internal capability to evaluate and use them.

Watch the agentic capabilities closely. DeepSeek tuned V4 for agent tools like Anthropic's Claude Code and OpenClaw. The agentic layer, where AI models perform complex tasks on their own, is where the real business value is heading. If open-source models can match proprietary ones on agentic tasks, the competitive implications are major.

Factor geopolitics into your technology decisions. This isn't optional anymore. The distillation accusations, the chip restrictions, the diplomatic tensions: these aren't background noise. They're variables that affect which tools you can rely on and for how long.

This is exactly the kind of shift we built the AI Operating Review to help companies work through. Not the hype cycle. The structural changes underneath it.

Where does this leave us?

DeepSeek V4 isn't a revolution. It's a confirmation. The pattern that started with R1 is continuing: competitive AI performance is becoming available at dramatically lower cost, from a company operating under very different rules than its American competitors.

The companies that handle this well will treat it as a planning input, not a news story. They'll test V4 against their actual use cases. They'll map their vendor dependencies. They'll build internal judgment about which models to use where, and why.

The companies that won't handle it well are still debating benchmarks next week while the ground shifts underneath them.

I've seen this pattern before, across different technologies and different decades. The moment "good enough and free" enters a market dominated by "best and expensive," the market restructures. Not overnight. Faster than most people expect, though.

The question isn't whether DeepSeek V4 is better than GPT-5.4. The question is whether your organization has the clarity to make good decisions in a world where both exist. If you're not sure, that's a conversation worth having.

Infographic

Frequently Asked Questions

Is DeepSeek V4 better than GPT-5.4 or Gemini 3.1-Pro?: V4 Pro Max claims superior benchmark performance over GPT-5.2 and Gemini 3.0-Pro, falling slightly short of GPT-5.4 and Gemini 3.1-Pro. That ranking will shift with every new release. The more important fact is that a free, open-source model is operating in the same performance range as systems that cost enterprises serious money to access.
Should my company use DeepSeek V4 instead of OpenAI or Anthropic?: Not automatically, and not without evaluation. The right move is to test V4 against your actual use cases and measure results. 'Free and close enough' is a real strategic option for many workloads. But you also need to weigh geopolitical risk, data handling, and your team's capacity to work with an open-source model.
What are the business risks of relying on DeepSeek or other Chinese AI models?: Three categories: geopolitical disruption (export controls, diplomatic tensions, or regulatory action could affect availability), data handling (open-source does not mean your data is automatically safe), and supply chain uncertainty (the chip accusations suggest the model's training environment is contested). These are planning inputs, not reasons to panic.
What does the 1 million token context window in DeepSeek V4 actually change?: It moves V4 into a different category of use case. A 128K context window handles long documents. A 1 million token window handles entire codebases, large datasets, or extended multi-step reasoning tasks. That is not incremental; it opens up workloads that were previously impractical with most models.
What is AI model distillation and why does it matter for businesses?: Distillation is training a smaller or cheaper model on the outputs of a more powerful one, effectively transferring capability at low cost. The accusations against DeepSeek matter because if true, they mean the competitive gap between U.S. and Chinese AI labs may be narrower than the underlying R&D investment would suggest. For businesses, it signals that the AI competitive space can shift faster than most vendor roadmaps will tell you.
How should business leaders respond to rapid changes in the AI model market?: Three concrete steps: audit your current AI vendor dependencies and identify where you are locked in, test open-source alternatives against your specific workloads rather than relying on benchmark rankings, and factor geopolitical and supply chain variables into your technology planning the same way you would any other business risk. If you do not have internal judgment to do that well, build it or get outside help.