Quantcast
Channel: Joab Jackson, Author at The New Stack
Viewing all articles
Browse latest Browse all 697

ICYMI: DeepSeek Is an Open Source Success Story

$
0
0

Nvidia stockholders have about $400 billion less in their pockets today thanks to the open source efforts of a Chinese AI startup company called DeepSeek. And OpenAI, as well as the other commercial AI model providers, might have to rethink their business models.

And the $500 billion in data centers the U.S. needs to keep its competitive edge for AI? Might not be necessary after all.

Kicking off the hysteria last weekend was the release of DeepSeek’s R1 reasoning model, which showed comparable results to OpenAI’s latest release O1, but at 1/50th of the training cost.

Just as importantly, DeepSeek was released as fully open source, under a very permissive MIT License, so any other company could replicate the model.

The implication, should DeepSeek’s promises test out, is that companies no longer would be beholden to OpenAI and other multibillion-dollar commercial services to build their own generative AI apps.

“DeepSeek has provided a massive gift to nearly everyone,” observed Anthropic AI engineer Ben Thompson. “The biggest winners are consumers and businesses who can anticipate a future of effectively free AI products and services.”

From Crisis Comes Opportunity

DeepSeek came up with this new architecture chiefly because of limitations, namely the U.S. export limits of the Nvidia H100, the latest Nvidia GPU.

Worried about maintaining AI superiority, the United States had prohibited sale of Nvidia’s H100 chips to China. In response, Nvidia came up with the stripped-down H800, which DeepSeek engineers optimized their LLM around instead.

Some clever engineering demonstrated that training leading edge models doesn’t necessarily require more interchip memory bandwidth.

“All of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800,” Thompson wrote. “If DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.”

So they HAD to use the slower H800 instead, and do a lot of fiddly AI engineering things to fit its LLM on smaller platforms.

The results were so good, in fact, that it raised a lot of questions around if we actually need $500 billion data center infra or expensive Nvidia chips when DeepSeek could do it at the fraction of the cost.

It showed a way where we don’t need large data centers just to serve up AI.

Open Source at the Heart

Don’t worry if you haven’t heard of DeepSeek before. It’s one of a number of Chinese startups competing in the large model market.

The DeepSeek LLM actually started as a side project for the company, something to do with its leftover GPUs.

The CEO has made it a point of pride to open source, and publish papers, about its work.

“Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one,” DeepSeek CEO Liang Wenfeng told China Talk.

The company didn’t go this route for the money but did it for recognition and, just as importantly, to attract better talent. And to level the playing field.

“In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up,” he said.

The Technical Side

In looking to cut computational costs, the engineering team adapted some innovative practices for the AI space.

Crucial to this success is a technique called distillation, which in a nutshell, queries another LLM for training. This is the area where the DeepSeek’s open source claims get murky, noted Amanda Brock, OpenUK CEO, in an upcoming post for TNS. She worries about the legality of the upstream data being used for DeepSeek.

But the model itself is less important than how the model is built.

Another tactic is to switch back to an on older technique, reinforcement learning, to improve reasoning abilities. Traditional models use reinforcement learning with human feedback to help guide the models toward the correct responses. DeepSeek built the reinforcement learning directly into the model itself. It could work out several responses and then, using self-guided reasoning, choose the one that seemed the most correct.

“You don’t need to teach the AI how to reason, you can just give it enough compute and data, and it will teach itself!” Thompson wrote.

R1 built on a lot of engineering work that come in previous releases, the V3 and V2 models for DeepSeek. V2 introduced the idea of “mixture of experts,” (MOE) meaning that not every part of the model would be fired up for each question, thereby saving memory space (OpenAI’s ChatGPT 4.0 also dove into MOE, though not at the same granularity).

V3 refined the MOE model and also brought further memory savings by reducing the size of the context window (user-supplied data) through a key-value store.

“V3 was shockingly cheap to train. DeepSeek claimed the model training took [~2.7 million] H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million,” Thompson wrote.

That’s $5 million, not $100 billion that OpenAI has claimed it needs to train its leading-edge models.

The greater efficiency could also be reflected in the costs of DeepSeek’s API service, which offered a million tokens at $2.19, compared with $60 per million tokens for OpenAI O1.

You can test DeepSeek via an app download or by web chat interface.

Lost Value

The implications, should DeepSeek prove to be as cost-effective as promised, could be massive.

It shows, as Thompson points out “OpenAI does not have some sort of special sauce that can’t be replicated.”

Wenfeng’s view was that OpenAI, Nvidia and Oracle were using the large computing requirements as a barrier to entry in the AI market, and some good open source know-how just completely obliterated that moat.

Companies can build generative AI apps at the fraction of what they would have previously cost, thanks to radically lower inference costs. They would still benefit from speedy Nvidia GPUs, but they wouldn’t necessarily need data centers filled with them.

The previously sky-high stock valuations of GPU provider Nvidia in particular benefitted from the company’s perceived lock-in value: The proprietary Cuda programming model and the the ability the lash together multiple GPUs into larger systems, Thompson noted.

“For Nvidia, this is scary. Their entire business model is built on selling super-expensive GPUs with 90% margins. If everyone can suddenly do AI with regular gaming GPUs … well, you see the problem,” noted Dropbox VP Morgan Brown.

Likewise, OpenAI is now fully a consumer tech company and faces a larger more commoditized market. According to The Information, the management of Meta Generative AI is worrying about the massive costs of building their own LLama 4 model after it tested worse than DeepSeek.

Microsoft CEO Satya Nadella evoked Jevon’s Paradox, in which, “As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of.”

Even U.S. President Donald J. Trump recognized the power of open source in a recent press briefing, even if he didn’t call it out by name. “The release of DeepSeek AI from a Chinese company should be a wake-up call for our industries,” he said.

“Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world,” Andreessen Horowitz cofounder Marc Andreesen commented.

The post ICYMI: DeepSeek Is an Open Source Success Story appeared first on The New Stack.

Turns out you may not need data centers of specialized hardware to build large AI models, just some good old-fashioned engineering and an open source license. Meta and Nvidia are freaking out.

Viewing all articles
Browse latest Browse all 697

Trending Articles