Decoding Digital Health: Artificial Intelligence Unveiled—The Science Behind AI | Insights

Speakers:

Megan R. Baca , Andrew O'Brien , Georgina Jones Suzuki

On this episode of Ropes & Gray’s podcast series, Decoding Digital Health, Megan Baca, co-leader of the IP transactions and licensing practice and co-chair of the firm’s digital health initiative, is joined by IP transactions counsel Georgina Suzuki and technical advisor Andrew O’Brien as they explore the science behind AI. They discuss how AI works, including the different categories of AI and its evolution, while tackling tougher questions such as how to spot “snake oil” in the AI industry. Whether you're in the legal field, a business development professional, or just AI-curious, this podcast is for anyone wishing to raise their IQ on AI.

Transcript:

Megan Baca: Welcome to Decoding Digital Health, a Ropes & Gray podcast series focusing on legal business and regulatory issues impacting the digital health space. I’m Megan Baca, a partner in our Silicon Valley office, and I co-lead our IP transactions and licensing practice. I focus on life sciences, technology, and AI-related licensing and collaboration transactions, and I also lead our digital health initiative. On this episode, I’m joined by my colleagues, Georgina Suzuki, based on Silicon Valley, and Andrew O’Brien, based in New York.

Before we get going with them, though, I also want to mention that Georgina and I are hosting a webinar on July 30 called “Leaky Models, IP Ownership, and Other Key Considerations for AI Dealmaking in Life Sciences.” I’m really excited for that, because we’re going to be exploring AI deal trends and key considerations for licensing and collaboration deals in this space where we’re combining life sciences with AI. Information on signing up for that free webinar will also be in our show notes, or if you can’t make the webinar on time, just check back later for a recording, and we’ll post it there for you to catch on your own time.

Georgina, welcome. Why don’t you introduce yourself and our special guest, and the topic for this episode?

Georgina Suzuki: Thanks, Megan. Hi, everyone. My name’s Georgina Jones Suzuki. I’m IP transactions counsel based in Silicon Valley, and my practice focuses on the intersection of life sciences, technology, and digital health. We’re very excited to have a podcast today which focuses on the technical aspects of AI. As lawyers or even as business development professionals, we don’t always know how AI works, and we have Andrew here today to help us walk through some of those issues. Andrew O’Brien, he’s a technical advisor with Ropes & Gray’s New York office. Andrew has a PhD in computer science from Drexel University, and he specializes, in particular, with AI. In preparing for this podcast, I asked Andrew to give me a fun fact about himself relating to the podcast, and I learned that, as a kid, he used to watch the Disney Channel a lot. His favorite Disney TV movie was called Smart House, which was about a family living in an artificially intelligent house that they started to lose control over. I suspect this might have planted the seeds for his future career, and we look forward to hearing more of his thoughts today. Andrew, thanks for being on today’s program.

Andrew O’Brien: Thanks for having me.

Georgina Suzuki: I wanted to start off today’s program just understanding at a high level the relationship between various topics we hear in the AI space. We hear reference to “artificial intelligence,” “machine learning,” “neural networks,” and “deep learning.” Can you just walk us through how all those things are related?

Andrew O’Brien: Absolutely, yes. You hear a lot of these words thrown around and often without definition, and I think it is helpful to stop and clarify at a very high level of what an “algorithm” is or a “program”—more or less the same thing. A computer takes an input, performs a series of well-defined steps one after the other, and then gives you back an output. The best way to think about that is just a recipe—if you take these ingredients and perform the steps one after the other as your recipe book says, you get a cake back. That’s basically all an algorithm is, except they are steps a computer can do like add, divide, write to memory—there’s only a handful, but in a sense, everything we have in computation comes from that very simple idea of a sequence of steps.

What “artificial intelligence” is is the attempt to build algorithms that, in some sense, can accomplish human intelligent tasks, so the idea of translating inputs to outputs that humans seem to be able to do really well but almost nothing else can. We might say playing chess, you take a chess position and you come up with a move as an output. How do you translate that input board position to output a really brilliant move? It seems that a name for a lot of those kind of tasks we call “intelligence.”

Then the question is, if you want to build algorithms that accomplish these intelligent tasks, how should you do it? It turns out, one approach that’s been really successful is, because we often can’t write down the set of steps exactly, we don’t even quite know. For example, take computer vision—if I give you a picture of a cat, everyone in a room can tell you it’s a cat. But it’s actually really hard to write down a sequence of steps that could take that image and output a cat. It’s strangely difficult that our brains just kind of tell us, “That’s a cat,” but looking at the pixels and explaining, it’s kind of hard.

Computer scientists in recent years—over especially the last 25-30 years or so—have come to this idea of “machine learning,” which is, “Can you come up with a program that basically learns how to make another program that does the task?” And that’s what machine learning is all about. So, if you’ve got an input-output pair you want to learn how to map, and you don’t know how to write the program, use a machine learning algorithm—you give it a lot of examples of the input that corresponds to an output. Say, to use our example from before, a picture of a cat with the label “cat.” A picture of something that’s not a cat we’ll label “not a cat.” You give it to a program, and it takes a bunch of those examples and basically it outputs what’s often called a “model,” which is basically the program that you wanted to begin with. That will then take in a picture of a cat as an input and output whether it’s a cat or not. Or in the chess example, it’ll take a position of a chess board and output, say, a move. That’s really why machine learning and artificial intelligence are so tightly linked—that there are just a lot of these tasks that we had no idea how to write those algorithm steps down, but it turns out that we could write these programs that knew how to extract them.

Then, lately, there’s been this particular kind of machine learning algorithm called a “neural network” that roughly tries to mimic how human brains work and learn. And, as we’ve discovered over the last really 12-13 years, that these neural networks can be really successful at a whole bunch of tasks that even other machine learning algorithms weren’t that good at. That’s the relationship between those concepts.

Georgina Suzuki: Great—thank you for that. Let’s piggyback a little bit on the “neural networks” concept. I’ve heard that there’s different types of neural networks. Can you walk us through some of the different categories of that?

Andrew O’Brien: Absolutely, yes. It wasn’t that long ago that there really weren’t that many kinds, but now, as the field has gotten so hot in computer science, there has been an explosion of them. So, it’s hard to give a taxonomy of all of them, but there are a couple that really are the dominant ones that you’ll just see over and over again.

The first one is something called a “convolutional neural network,” referred to often as a “CNN,” and that’s basically one that tends to deal with your computer vision problems. How it works is loosely based on how the human brain recognizes images. In mammalian brains, the idea is that they have these filters that can recognize many features and images, and then from those features, build out more complex features. A key idea there is it’s for images.

Next, you tend to have something called an “autoencoder.” What an autoencoder is is two neural networks that work together to summarize data or to encode it. So, the idea is one neural network takes maybe a big entry of data that we want to compress in some way, and it tries to compress it. How we know if this autoencoder is compressing it well is if another neural network can take this compressed data and then create the original piece of data. To give an example, let’s say you have an image—we all know images can be very big—and sometimes, we want to shrink them so we can send them through email and such. But we want to preserve a lot of the detail—we don’t want our cat to be very blurry. What an autoencoder might do is, it takes the input of a cat, compresses it down to this smaller data representation, and then, on the other side after we email the cat image, the neural network can recreate the original cat image very successfully. How we train it is, we basically take the input image and compare it to the output image, and we slowly make the network get a better and better approximation of the input image and the output image—try to get them to be the same image and passing them through what’s called a “bottleneck,” which is that small representation we create in between.

Another one—that’s probably really the hottest one—is something called a “transformer.” It’s at the basis of what you hear are these large language models—that’s basically the things you’re seeing everywhere now, like ChatGPT and Claude, and Google has one called Gemini. What those really do is they’re a very successful neural network at dealing with sequential data. For a long time, neural networks they had trouble when your input was a very long sequence—so, if you wanted to input this huge sequence of text and you wanted the neural network to, say, output some answer to a question about the text, they would often forget things that were in the earlier part of the input by the time you got to the end of the input, and computer scientists were confused by this. Eventually, in 2017, someone came up with what’s called the “transformer architecture”—it’s a particular kind of neural network that is really good at these. It uses something called “attention,” which is the idea that it can come up with these features that tell certain words what other words to pay attention to that give them meaning—it’s like how a human, when it looks at something, it focuses in on certain aspects to pay attention to. The transformer does this but for long sequences of text, and this allows it to understand things that might be very far apart in the input. The upshot is that it’s now very good at dealing with things like language, where we can have these long conversations with ChatGPT, and it can remember things we told it a thousand words ago and such.

Then, finally, there’s this architecture, which is not so much a single neural network but a process using them called “diffusion.” If you type in a text string or an image you want to see, say, “I want to see a tomato going to the moon,” an image will be generated of a tomato on a rocket ship going to the moon. How that tends to work is you take an image, and you basically slowly inject what’s called “noise”—you make the image blurrier and blurrier until it’s completely blurry—and then, you train a neural network to slowly remove that blurriness. The idea is that if you can do that, neural networks can then recreate images from this blurry space, and because text can also be projected down into that blurry space, they can translate text to corresponding images.

I would say those are really the big ones that are worth knowing right now that are commonly used, and on the state of the art.

Georgina Suzuki: Thanks—that’s a really helpful overview. At some basic level, is it correct to assume that many people don’t actually know how AI models work? We have some understanding of how they work, but at a very detailed level, there’s a little bit of an unknowability to this. Is that a correct assumption?

Andrew O’Brien: Absolutely. One of the things I think would stun most people is how little even AI researchers and machine learners actually know about neural networks. They’re often called “black boxes,” because the idea is they’re so large and often complicated, we don’t know exactly how they’re doing what they’re doing. There’s a whole field in artificial intelligence called “explainable artificial intelligence,” which is really focused on trying to come up with ways to interpret what these neural networks are actually doing. There’s this paradoxical thing going on in AI research where we’re creating these things, but how we’re learning about them is as if we discovered them in the wild. You have this weird issue where machine learning researchers are frequently surprised by the capabilities of their neural network, or they go wrong in weird ways—you see bias creeps in and things like that. I don’t know how many analogs there are to that in the history of science. It’s interesting—I think many people would be surprised by how true it is.

Georgina Suzuki: That’s absolutely fascinating. I think we’re going to be impressed about what we learn in the years to come in this space. Let’s move on. What exactly are GPUs—we often hear about that—and what role do they play in the AI ecosystem?

Andrew O’Brien: If you remember what I said earlier about what “algorithms” and “programs” were: Basically, you take an input. You perform a sequence of computational operation steps on it one after the other. You add, you divide, you save to memory. Then, you get an output. As it turns out, a lot of the time, we don’t actually need to wait for one step to be done to start doing another step—sometimes, you do, sometimes you don’t—that’s called “parallelism.” You can actually speed up your algorithm a lot if you just, say, start doing the steps that aren’t waiting on your current step. What GPUs are really good at is that—they’re performing these arithmetic operations in parallel. There’s a lot of time savings that can be done in machine learning by just trying to perform as much arithmetic as you can simultaneously, instead of one step after another, which is more what your CPU—which I think we’re mostly familiar with—does. It’s funny that this technology actually came out of video games—that’s why it’s called a “graphics processing unit.” It’s the idea that performing parallel matrix multiplication operations was really important for the graphics of video games. It turned out that as the secondary use in machine learning, particularly neural networks, that really, we wouldn’t be able to do anything that we do now with output, that it would just be too slow, and it’s one of the big reasons neural networks work now. Neural networks are not a new discovery—they were known back, I think, in the 1950s. But one, among many problems back then was, these things we wanted to do were just too slow with contemporary hardware. But GPUs have given us unimaginable speedup and are really the workhorse of the contemporary AI revolution. We saw NVIDIA, the company that makes by far the most important ones, just became, I think, the most valuable company in the world—it overtook Apple—so, it just shows you how important they are.

Georgina Suzuki: Great, thanks for that. How does AI relate to open-source and closed-source software development and their related paradigms?

Andrew O’Brien: In some sense, it’s continuous where you have this world of a company selling some software privately. You might think of your Microsoft operating system versus a Unix-type OS where the source code is made available, and developers can work on it together collaboratively. There’s a very strong tradition of that in Silicon Valley and software world generally. You’re seeing that now, these large language models, some of which have been made open-source where the code for the model is available, some of let’s call it the model’s “weights” are made available, and, most famously probably, the company formerly known as Facebook—Meta—they have made their Llama model available, many versions of it. And you have some more closed-source ones, ironically OpenAI, probably the most famous of those, with its GPT models. The real difference that you’re starting to see now is whether these open-source models can continue in face of the huge expense it is to train them.

Oftentimes, taking data and training the model costs millions of dollars now, and this is a challenge the open-source community, I think, has never really faced before. Whether you’re going to continue to see an open-source community in this large language model transformer space, which is dominated in AI, I don’t know. I’m hopeful, but we’ll see.

Georgina Suzuki: Can you also give us a quick summary of the history of neural networks? How did we really get to this tipping point in the history of AI?

Andrew O’Brien: Again, I think something that would surprise a lot of people is the idea that neural networks are recent is not true. I think, going back to the ’40s and ’50s, this idea of using computers to mimic how neurons in the brain work had occurred to them—people were writing papers on it, attempting to build them. The real problem was that, like a lot of the inventions in science, that the idea between its conception and all the little, mini, smaller inventions you need to make it actually workable can be a considerable amount of time. That’s what you saw with neural networks when you get the original conception of them in the ’40s and ’50s. But there are certain problems—their original conception, it was pointed out that there were a lot of relationships they actually couldn’t learn that had to be fixed that caused what was often referred to as the first “AI winter.” This scientist, named Marvin Minsky, pointed out that on these original neural networks, they could have learned even some very simple functions. It turns out, to make them work, you just need a lot of data, and up until very recently, we just didn’t have anything like the data we needed.

You also needed to find ways to train them—to get them to basically learn to take the input and map it to an output. Techniques to train neural networks effectively, in particular, you needed this algorithm called “backpropagation.” It’s a little bit technical, but the idea is you need to compute the derivative of the loss of your model in order to update its weights. That’s not a super important detail, but the idea is, it’s this technical mathematical thing that you had to wait really, I think, until the ’70s and ’80s for that to get figured out. Basically, to get some of the additional algorithmic ideas you needed, the amount of data you needed with modern sensors, the internet, all this modern data collection, faster computers, GPUs, and a number of other various architectural changes to the models themselves, it really takes up until about 2012.

You go from the ’40s and ’50s to about 2012 where there’s this big breakthrough called AlexNet—that’s what we now consider a very simple model—it’s a six-layer model, but it greatly improves performance on image recognition. It becomes clear that this AlexNet model, which is a convolutional neural network of the kind we discussed before, really is much better at image recognition than anything else. And because image recognition was such an important task in the field of artificial intelligence researchers, this really set off, in 2012, just an explosion of revived interest in neural networks. It’s one of the, I would say, probably the big revolutions in the history of science—that before that moment, it was far from clear, I think, to many AI researchers that neural networks were the way to go. Just a couple years afterward, it was almost universal in the artificial intelligence community that they were the way to go.

Georgina Suzuki: Wow, that’s really interesting—thanks for that. I’m going to ask you a difficult question next, which I’m not sure if anyone really knows, but it’s one that I often think about. Let’s say you’re a company and you’re interested in a certain AI product, whether maybe you’re going to do an investment in them, maybe you want to acquire the company or do some other partnership. How do you figure out which AI product is snake oil versus the real deal? And how do we really prevent the next Theranos in the industry, when it comes to AI technologies?

Andrew O’Brien: Yes, obviously, that’s a tricky one. But I would say the first thing to watch out for is a simple rule: beware of benchmarks. In how computer science research tends to proceed is, there becomes a particular data set or a particular task that everyone, say, working on in a particular subfield tries to focus on—that makes different papers comparable. If we’re all trying to recognize the same set of images, we can measure our accuracy and say, “My model’s better than yours.” Oftentimes, when someone’s claiming a state-of-the-art result or better than human performance, which is obviously a big thing in AI, they’ll point to one of those benchmarks. A problem with that is, those benchmarks often don’t correspond as well as we’d like to the tasks in the real world. I’ll give you an example. It was a long time ago that it first became known that computer vision models, these convolutional neural networks we’ve discussed, could outperform a human radiologist on diagnosing certain conditions. There was a much slower uptake in the medical field than I think you would have been predicted from some of those results, and one problem we kept having was, the benchmarks didn’t correspond like we’d like to the real world. For example, oftentimes, you’d have parts of the world where the images that you’d get were blurrier than the images the computer vision system had been trained on, so its performance would be a lot worse. You got into a number of controversial scenarios with facial recognition where the model was trained on, say, disproportionately Caucasian faces, and it would perform a lot worse if it was trying to recognize faces of a different race.

And so, obviously, for things like computer vision systems being used by law enforcement, that was something that was worrying a lot of people—correctly so, I think. I think that’s a general problem.

If you follow a lot of AI news, as I do, you’re always hearing about human-level performance exceeded on this benchmark. If you listen to these things, it’s astonishing any human has a job anymore—that it’s just, “These humans can’t beat the machine,” and, “That machine crushed the human again.” But humans do have a lot of jobs still, actually—a low unemployment rate historically. I think part of that just has to come with the fact that a lot of tasks just aren’t super well-encapsulated by these benchmarks. So, to come back to your question, if someone has a benchmark but they can’t get customers to adopt their product in the real world, no matter how impressive the results in that benchmark, I think that’s a red flag.

Georgina Suzuki: Thank you so much for that answer. That’s really helpful to know about. As we get to the end of the program, I want to ask you one final question: What is the one thing that you think people get wrong about AI but actually matters from a business or legal perspective? What do you wish people knew about AI?

Andrew O’Brien: I would say, how much of the biggest questions in AI will ultimately not be scientific, but ethical, moral, and political. When we’re talking about AI— even if you go to a computer science conference, we’re talking in scientific terms: math, data sets, this and that—there’s, I think, a sense that you say “Leave it to the computer scientists. They’ll figure it out.” But that’s wrong, because ultimately, the AI is going to act in aligning with some set of values, and those are not scientific questions—those are ethical questions and moral questions, and they’re actually the biggest questions. “How should a missile system weigh civilian lives versus hitting its target?” Things like that. I’ve spent maybe too many years studying the scientific part of artificial intelligence—nothing has convinced me that that part is more important than the more fuzzy, humanistic, ethical parts of it.

Georgina Suzuki: Interesting. I think that’s where we’re going to see a big role for lawyers to play in the future as more regimes, such as the EU AI regime, come out and other state laws, which try to legislate and regulate in this space. I think there’s going to be a lot of opportunities for lawyers as we figure out how to regulate the space.

Andrew O’Brien: Yes, absolutely. I would say this makes the law more important, not less.

Georgina Suzuki: Thank you so much, Andrew. You’ve certainly raised all of our IQs today on this topic, and we really appreciate all your time and effort in walking us through this.

Andrew O’Brien: Thanks for having me on—it was great to be here.

Megan Baca: Georgina, thank you so much for that insightful discussion with Andrew. Thanks to both of you. I definitely learned a lot. For our listeners, we appreciate you tuning in to our digital health podcast series. If we can help you navigate any of these issues that we’ve been discussing today, feel free to get in touch with Georgina or me or your typical Ropes & Gray advisor. For more information about our digital health practice and other topics of interest in this space, please feel free to visit ropesgray.com/digitalhealth. You can also sign up for our mailing list, as well as getting invitations to our digital health-focused events. And don’t forget to check the show notes for our upcoming webinar on AI and life sciences. Finally, you can subscribe to this series anywhere you listen to podcasts, including Apple and Spotify. Thanks again for listening.