Alumni @ RopesTalk: DeepSeek Deep Dive with Dr. Vasanth Sarathy, Tufts University | Insights

Speakers:

Vasanth Sarathy

On this special edition of Ropes & Gray’s Alumni @ RopesTalk podcast series, technology and IP transactions partner Regina Sam Penti is joined by Dr. Vasanth Sarathy, a professor of computer science at Tufts University and a Ropes & Gray alum. Together, they delve into the technical and business implications of DeepSeek, a Chinese app and model that has generated significant buzz in the AI world. They explore whether DeepSeek is a game-changer in AI, its cost advantages, and the potential impact on AI investments and startups. Dr. Sarathy also addresses the privacy and security concerns associated with DeepSeek, especially given its origins in China. Tune in to understand the reality behind the hype and what it means for businesses considering AI adoption.

Transcript:

Regina Sam Penti: Hi. Welcome to this episode of Alumni @ RopesTalk. My name is Regina Penti—I’m a partner in the IP transactions practice here at Ropes & Gray and have advised dozens of companies on AI strategy and adoption. Today, we are discussing DeepSeek, the Chinese app and model. Many claims have been made about what this app and model mean for the AI world. Is it the killer of the current AI paradigm? Is it the fixer of AI energy issues? Is it a democratization of AI? And so on. My guest today is Dr. Vasanth Sarathy. Vasanth is a professor of computer science at Tufts University. His research is at the intersection of AI and natural language processing. He also advises companies seeking to bridge the gap between business and technology on a regular basis. Vasanth is also a fully trained lawyer and a Ropes alum. Prior to going into academia, Vasanth was an associate here at Ropes & Gray for almost a decade. So, we’re really excited to welcome him back to be discussing this really important topic. Welcome, Vasanth.

Dr. Vasanth Sarathy: Hi, thanks for having me.

Regina Sam Penti: Vasanth and I will focus today on the technical and business implications of DeepSeek, and we’ll try to sift the reality from the hype. For anyone interested in the legal risks and considerations regarding DeepSeek, I coauthored a Ropes & Gray alert which addresses these points—please reach out to me if you’d like a copy. Let’s just jump right in. For our listeners who may not be fully up to date on what DeepSeek is, how do you describe it? What makes it special? And why has it generated so much excitement, and if I must say, confusion?

Dr. Vasanth Sarathy: DeepSeek, at the very high level, is a large language model (“LLM”) that was designed and built in China by a hedge fund. The underlying idea here is that, traditionally, software engineering involves somebody writing a program—you take some input, you write the program—and it does some computations and produces an output. Machine learning came along and said, “You don’t have to write the program. We’ll just give it a bunch of inputs and outputs and it’ll produce the program for us.” Deep learning took that to a whole new level where you now have much more complex systems. One way to think about these systems is, like, a big, giant machine with a whole bunch of knobs on them. When these machines get trained, those knobs get automatically adjusted or tuned, and so that in the future, when you have some input, you can get the right kind of output. Language modeling is one subset of all of this where you’re trying to predict the next word. That’s essentially what you’re doing—you’re looking at the existing set of words and you’re saying, “What is the most likely next word?” The large language models do that, except that they were trained on very large amounts of data—trillions of words across the internet. Models like ChatGPT, that people are aware of, is an example of a large language model. There are others: Claude. DeepSeek also is in that same family of large language models.

The interesting thing about DeepSeek is that when these large language models come out, they often do a whole bunch of what’s called “benchmarking,” which is, essentially, evaluating these models to see how well they do on a variety of different tasks. One particular group of tasks is framed as what’s called “reasoning.” You can think of reasoning as just thinking through multiple steps or something like that. The goal of a lot of the new AI models is that we want to build better reasoning models, so they actually think about an answer before they give you the answer, or they are able to do some kind of thinking internally as well. DeepSeek is officially the R1 model—the latest R1 model is another flavor of the reasoning models. What’s interesting is that the performance of these models, particularly DeepSeek, isn’t that much better than existing models. Models like ChatGPT, and Meta has a model that’s called Llama, all perform very similarly when it comes to reasoning tasks. And so, then the question is: Why did everything go crazy when DeepSeek came out?

One of the big reasons is that it was very cheap. If you were using the app, you were paying much less than you would pay for ChatGPT, and that got people’s interest. So, that was a big factor. It was significantly cheaper, I think, 30 times cheaper. Then, we have the other issue, which is that even though the performance of DeepSeek is comparable to, say, ChatGPT and its later versions—the most recent version is called o1 and o3—DeepSeek does much better than a lot of the open-source models, which are models that are publicly available for anybody to use. Oftentimes, models like ChatGPT are behind a proprietary wall, and you don’t get to see the internals of these models at all. The most you can do is interact with the API, which is a fancy way of saying just their interface that is publicly available, but the rest of the model isn’t. But DeepSeek came out and said, “You know what? Everyone, you can have it—here it is.” And so, that got a lot of interest and a lot of attention for that reason.

Regina Sam Penti: It seems we should be excited that there’s a new model on the block, possibly cheaper, and we saw the same capabilities. So, why did NVIDIA drop $600 billion in a day?

Dr. Vasanth Sarathy: Whenever you work with a large language model—I had mentioned before that you have a lot of training data, trillions of words from the internet—in order to process this and train the model effectively within a reasonable amount of time, you need a lot of computational power, also called “compute,” and you need a lot of data, a way to store the data, and move the data around. And, like I said before, these models have lots of these little knobs that you adjust—some of these large language models have hundreds of billions of these knobs. When the deep learning revolution really kicked off in the early 2010 range, they started using what’s called “GPUs,” which is short for graphical processing units. These were little hardware devices, little chips that are in your computers, and they help you look at the screen, graphics, and so on. But they were heavily used by the video game industry, because you needed really fast graphics processing in video games to make it smoother and more realistic. NVIDIA was primarily selling GPUs to the world of video games, but when the deep learning revolution came out, they pivoted quite a bit, and although they do video game graphics cards, a lot of the deep learning applications started using it.

When you have these large language models that are training, every step of the training process oftentimes needs GPUs in order to make it more efficient. The reason GPUs make things more efficient is that they run things in parallel, which means they can run all of these computations or many of these millions of computations all at the same time, as opposed to one after the other, and because of that, you get a big speed-up in how you can do this. GPUs are very important technology that sits at the very bottom level in terms of hardware off all of these LLMs, and so, training them, you need a lot of GPUs. When you finish training them and you’re serving them up to the general public to use, you also need GPUs to be able to load up this model and run it on your computer. And so, GPUs play a role in all of these different places, and big companies—Amazon Web Services, Azure from Microsoft, and Google’s Cloud—have data centers that are filled with GPUs.

Now, the reason NVIDIA stock potentially dropped $600 billion was a reaction—and some might argue, an overreaction—by the market to this notion that DeepSeek was made a lot cheaper. In order to make open AI’s ChatGPT, it is believed that they spent about $100 million. DeepSeek came out and said, “It cost us $5 million and significantly fewer GPUs.” Moreover, because of export controls, they had a suboptimal version of the NVIDIA GPUs that we have here in America, and so, they were able to do all of that with fewer GPUs and produce results that are comparable to these other top models. I think the general thought was that maybe GPUs aren’t the most important thing in the world, and maybe, as a result, NVIDIA might end up selling less. I think that was the reaction or overreaction of that market initially.

Regina Sam Penti: That’s very helpful. It looks like there’s been some assumptions built into at least the cost structure of developing different AI models and maybe applications that run on AI models. For some of our clients and other listeners who, as part of their jobs, have to think about valuation and investing in AI companies, and really figuring out where the future returns are going to be and where the spend is going to be, what does this mean? What does this mean for funding for startups and valuation in the AI space?

Dr. Vasanth Sarathy: There is a lot of money being poured into AI right now, and where it’s being poured in does matter quite a bit. A lot of observers have noted that this is a case where even though the GPU requirements might be less, you might end up seeing more GPUs being sold because of what they call “Jevons paradox,” where you have some resource that reduces in cost, you end up having more of that resource and people actually want it. The market does really want intelligent systems, and so, there is still a demand for it. The sales of GPUs are going to go up is what many people predict.

The way to think about the entire ecosystem of large language models is as a stack, just like a stack of papers. At the very bottom of that stack is hardware, like your GPUs. Those GPUs are then needed by the next level in the stack, which is data centers. Data centers have a whole bunch of computers, and therefore, they buy all those GPUs. Those data centers are what’s used by companies like OpenAI, Meta, DeepSeek, and other places to actually train and run their language models, and so, that’s one level of the stack. Then, if you go up one more level, what ends up happening is people, startups especially, utilize these large language models that are already being trained and built by these others in various applications. So, you have it used for search. You have it used in enterprise settings—that’s a big boom right now in terms of AI apps that are being created. But they’re all created at the top level of the stack. What’s interesting is that the bottom two levels, where the GPUs and the date center is, is where a lot of the investment was going—it costs a fair amount of money to do that. NVIDIA sales last year was about $150 billion, which roughly means they sold $150 billion of GPUs. If you take that and you think about investments in data centers, then that $150 billion you can double, because there are other costs in data centers—that’s about a $300 billion investment in data centers. Then, if you take the next level up in the apps and so on—fewer apps are using that data center infrastructure and building their own apps, they’re going to want maybe a 50% margin, and so, you have a $600 billion market.

Now, the whole goal of this AI system is the fact that people are hoping it all pays out. You put in a lot of money into the bottom levels. You get all the data centers going. Then, these apps come up, and hopefully, the revenue generated by them is going to make up for these investments. The problem is, right now, it isn’t—there’s a big gap. David Cahn of Sequoia Capital actually has a great article about just this—he calls it the $600 billion gap. It’s because of this, and it’s because of this difference between what’s actually the revenue-making piece and the investments that are just being pumped in right now. DeepSeek, as I mentioned before, is an open-source model, which means they wrote a paper on this—they told us exactly how they did it. Other people can replicate that, which means there’s other people who can develop foundation models that are also going to be cheaper, require fewer GPUs, and so on. And so, it may be the case that the investment might move up the tech stack. There might be more AI app companies coming out. There might be more investment in the app space. I think a lot of the VC investment and startup investment is going to be in that space, because there are so many opportunities right now for AI app developers to really add a lot of value across the board. From things like search—there’s Perplexity AI, to Elicit for research papers, Harvey for legal searches, OpenEvidence for medical searches, and so on—there are all kinds of tools that help enterprises manage their own documents, and ways for AI tools to use existing documents, and help internal users use that.

That said, what’s interesting in the last couple of days is that you see a lot of investment by these big companies in data centers, which you would think is contradictory to what we’ve been talking about so far, but what that means is they’re not giving up on what’s called the “scaling laws,” which is that if you put in more compute and more data, you’re going to get better results. Just because DeepSeek came out and made something cheaper, doesn’t mean that that doesn’t continue, that you can potentially still keep adding compute and data and still getting better performance. There is some investment obviously going in there, and we saw the Stargate Project investing heavily where OpenAI and some others were investing heavily in data centers in the United States. So, I think there’s a big piece there that’s still going on, but I do see a lot more investment and a lot more AI app developers at that top level of the stack.

Regina Sam Penti: Basically, it’s going to cause the investments to be a bit more evenly distributed across the stack is what you’re saying, so that’s excellent. One of the key features highlighted in the coverage around DeepSeek is the fact that it’s an open-source model. Can you speak a little to the significance of that, what it means for AI innovation, and more importantly for our listeners, what it means for companies that may be looking to swap out their existing models?

Dr. Vasanth Sarathy: Having an open-source model is wonderful for the research community, for the app development community—it’s wonderful for a whole bunch of different people who need it. The whole idea of being open source is that you make available all of the model weights, which is essentially the knob settings, and the code that’s potentially needed to be able to run these models publicly, which means anybody, you and I, if we have the appropriate computers, can download it and run it on our system, and that’s really powerful. Meta has been doing this for years with their Llama models—they’ve all been open sourced. It’s been great for the community—app builders all rely on that.

One big benefit of having an open-source model is that you can download the whole model and have it locally in your computer or in your enterprise or business’ computers, which makes it private. So, when you’re asking the large language model questions, it’s not going out there somewhere—it’s staying within the company’s bounds. You can fine-tune it with your own proprietary data, but you’re not worried about losing sensitive information outside of the company. The alternative would’ve been sending your data through APIs to open AI, in which case you might have had to sign a whole bunch of confidentiality and all kinds of prior-to-seeing data security type documents to make sure that’s safe. But now, you don’t have to worry about that with this kind of model—you can keep it in-house.

Regina Sam Penti: Given just how often I see this question and concern around sensitive data from our clients when they’re looking into AI adoption, I can see that being a huge selling point, the ability to realistically either run a model or have a model that’s localized to your own environment. Thank you for highlighting that. Some have said that DeepSeek represents the commoditization or democratization of AI. Do you agree with that sentiment?

Dr. Vasanth Sarathy: It’s one of the many things that have contributed to that. I don’t know about the democratization part, but the commoditization part, for sure—there’s already been a trend in that direction. I’m not particularly convinced that DeepSeek has changed or accelerated that tremendously in any way or form. Like I said, it’s made things a lot cheaper—it definitely helps with that. It doesn’t change the way that is going to happen anyways. People want to build intelligent systems. In order to build intelligent systems, they want to incorporate large language models. Now, large language models have limitations. There are some things that they can’t quite do yet. So, all of these companies are trying to make them more performant, doing more reasoning, doing smarter reasoning, and being just generally smarter. DeepSeek came out and said, “We can do this a lot cheaper and it’s equally as smart as the other models.” So, that is going to help with increased adoption of DeepSeek-like models in the AI app space. You get the higher performance for a cheaper price, so that’s going to definitely help with that commoditization piece, especially, you’ll need that higher performance when you start having really complex tasks. These models start hitting walls pretty quickly once you start having them do a whole bunch of different things that human would be able to do, but as you get these more and more performant, they’re able to do these tasks better, which, again, is a motivator for building more complex AI apps. I think because of the open-source aspect of it, you are then able to fine-tune it for your own specific purpose. Fine-tuning is a process in which you take an existing model that’s already been trained, and then you give it your specific kind of data. It doesn’t have to be proprietary, but you give it your specific kind of task and specific kind of data for the purposes of it to perform really well for that one task.

Regina Sam Penti: We’ve talked about DeepSeek—the app, the model—but less about the origins, so I want to just switch tack a little bit. A lot has been made or said about the fact that DeepSeek came out of China, that the data that it collects would be stored in China, and it’s raised questions around whether U.S. companies should adopt DeepSeek at all and whether export controls work. What’s your take on these issues? Are there data and cybersecurity, or even national security issues that we should be concerned about?

Dr. Vasanth Sarathy: Yes, that was a big aspect here. In fact, some people have argued that the markets dropped because there was this belief that somehow China is now ahead in the AI race because of this DeepSeek release, so I want to address that point first. The aspect that China is somehow unique and, therefore, ahead because they produced this model, I’m not entirely buying that idea. This is just a result of having constraints. They didn’t have the same kind of hardware, maybe they had less money, but they were able to come up with a smart way to do the same thing, which could have just as easily happened in an academic lab in the United States. Academics, including myself, we have limited resources—we don’t have the same $100 million that OpenAI has—and so, we come up with smart solutions. I’m not particularly thinking about this being part of the AI race between China and the United States being that relevant for that purpose.

That said, DeepSeek has two versions. I talked about the open-source version a lot. But what they released was actually the app version first—basically, like a ChatGPT thing where you can type in your questions, and you can get it to respond. That is an API, which means that anything you type in goes to the servers in China and gets processed by the DeepSeek model that they have there, and then, sent back to you as a response. If you look at their privacy policy, they collect that data that you typed in, but in addition, they collect your location information, everything about your device, and they collect your keystrokes—so, all the keystrokes that you used while you were in the app. It’s unclear whether they collected keystrokes while the app was running in the background, but you don’t have it open—regardless, there’s a lot of data that goes right back to DeepSeek’s servers.

Now, the reason it matters in the China case is that the Chinese government does have access to those servers, and, I guess, the fear is that somehow, they get access to data from the United States in this way. That is potentially a privacy concern, but, again, a concern that may not be as big if you use the open-source model. If you use the app, then, yes. And you can see the parallels between DeepSeek and TikTok here, because that same sort of idea is happening here. It’s a Chinese company, you have a very popular app, and there might be similar issues with this as well—and we’re already seeing this. We’re seeing many states that are banning the use of DeepSeek for government employees. We’re seeing it across the board, regardless of political affiliation. So, there’s going to be that kind of reaction for sure, because they’re worried about the data going over to China.

There’s another angle here, which I think is very interesting. If you ask DeepSeek very sensitive questions about China, the Chinese government, or about Taiwan, the answers that come out are heavily censored, and so, that was a concern. The issue was, how much of this model is actually censored? And, worse, how much of this model is actually disinformed or misinformed, and trained specifically to give you the wrong kind of information? All of these issues are still there, and that’s a big concern, obviously. Again, you can take it out when you have the open-source model. In fact, Perplexity, the company that has an AI search engine, uses DeepSeek also behind the scenes. They have reported that their open-source model doesn’t have the same censorship limits that the app does, so they’re able to ask questions about, say, Tiananmen Square or whatever, and they’re able to get answers directly from DeepSeek. Presumably, the censors and those kinds of controls are only for the app. Although it’s an open-source model, we don’t know all the data that it trained on. We might have some guesses—we don’t really know. That, I think, captures all of the China problems together in one bucket, which is we don’t know a lot of things. We don’t know for sure if they only spent $5 million training this. We don’t know for sure that they only used these resources that they claimed they used. And we don’t know for sure what data they trained on in the model as well. There’s a lot of uncertainty about this model from that perspective, and so, people are nervous about incorporating them in their products, understandably so.

Regina Sam Penti: That makes sense. I think those would be very important considerations for our listeners who are maybe considering whether or not to adopt the model or try out the API. That’s a nice segue to my next question, which is: What sort of general advice would you give to businesses considering adopting DeepSeek, whether it’s the model or the API?

Dr. Vasanth Sarathy: I would say don’t use the API. I think using the model is worthwhile. If you have AI engineers capable of analyzing the model and running the model locally, then you’re able to fine-tune it, train it, and do all of that safely, because you have it locally on your machine. That said, again, if there are bans that are imposed later on this model, then that might affect your business process. You should be able to sub out one model for another, so in case you can’t use DeepSeek anymore, you should be able to switch it out with something else and try them out. I think that most companies do this already—they work with lots of different models, and they figure out for their use case, for their app, which model produces the best response both in terms of its performance and accuracy, but also in terms of its style and all of that. People are already comparing different models as they use them in their systems, so I don’t see that any different with DeepSeek, and there’s models coming out constantly. So, you’re just going to see more and more of these, and people are just getting comfortable swapping one out for another.

Regina Sam Penti: That’s super helpful. Of course, this is AI, so we know there’s probably already 10 hottest, newest models that people are talking about. Are there other just generally emergent trends in AI?

Dr. Vasanth Sarathy: Adding to what we said before about how these app developers are going to swap in/swap out different models, it’s not just the model that these app developers are actually using; they’re building it on top of their own existing system. As a result, they are going to have a particular use case. One very common use case now is for these models to, say, read internal documents and return a result. That’s an example of what’s called “agentic AI,” where the AI is an agent, which means that it can take its own actions outside of the language model that it’s in, in order to answer a question better. For example, if you wanted to know what the temperature in Paris is right now, the model itself won’t be able to answer that question, because it was trained months ago, and it doesn’t have that latest information. If it was an agent, it would be able to access, say, The Weather Channel or something else, find out the answer to that question, and return it in a way that’s relevant to you. Maybe your question wasn’t, “What’s the weather in Paris?” Maybe your question was, “Is today a good day to go to the museum in Paris?” That’s a more general question which requires the language model to look up the timings of the museum, the weather conditions, and see if there are any traffic delays. All of these are external requests that this model is making, but it’s synthesizing all that knowledge and then telling you at the end, “Actually, today is an okay day. Maybe tomorrow’s a little better.” And so, that’s a very valuable use case. This year, it seems like there’s going to be a lot more agentic or AI agents’ apps coming out, so I think that’s a big, big trend.

Another big trend along these lines is what people in the field call “multimodal AI.” So far, we’ve been talking about language models as text—you put in text, and you get out text. But people are also building visual language models, which means you can send in images, you can have it generate images, much like DALL-E and Midjourney, but you can also have it understand images and work with them. ChatGPT already kind of does that, but there are limitations in that technology. We expect that technology to get much better over the course of the next couple of years. And because GPUs are a lot cheaper now, it’s not only the case that AI apps could be developed cheaply, but also, because the power consumption is so low, you can start putting it on other devices that don’t require as much energy. The buzzword here is “edge AI,” which is on these edge devices, which are devices that are for specific purposes, like your Ring doorbell and your Nest thermostat. These are devices that could potentially benefit from an AI use case. We’re not at that point yet where we can put an AI chip in these things, but that’s the direction we would be going in if things get more efficient.

Regina Sam Penti: That’s exciting—so much to look forward to. As with all AI models and platforms, it’s critical to carefully review DeepSeek’s terms of service, privacy policy, and other related agreements to understand the legal risks that come with using its AI tools. We discussed these and other legal issues in a Ropes & Gray alert that was sent out in late January, and I’m happy to share that with anyone who may be interested. With that, this has been a really insightful discussion. Please join me in thanking Dr. Sarathy. For those listeners who may be interested in more of our podcasts, they are available wherever you listen to podcasts. Thank you very much for tuning in.