Ifs, Bots and Maybes: Assessing the EDPB’s Opinion on AI Models | Insights

Authors:

Most organisations appreciate regulatory certainty. It is almost always better to know — and to be able to plan for — what is expected of them, even when the requirements may be challenging.

On 18 December 2024, the European Data Protection Board published what was by some distance its most anticipated pronouncement of the year: Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models. In the lead up to its release, many businesses were hopeful that the Opinion would provide clarity on how to interpret, understand and apply the GDPR to their development and/or deployment of artificial intelligence-enabled products and services.

For the most part, however, the Opinion does not provide that clarity. And although it is helpful in places (often for what it does not say), the Opinion will likely leave many readers with more questions than answers.

That is because the EDPB’s conclusion on the two questions that comprise the majority of the Opinion — Can an AI model be anonymous? Can legitimate interests be used as a lawful basis for processing personal data in an AI model? — is, in each case: maybe. The Opinion uses the phrases “case-by-case basis”, “case-by-case assessment” and “context of the case” 19 times, and makes clear that the concrete interpretation of these questions will ultimately be left to supervisory authorities to work out. Not to mention the developers and users of the AI models themselves.

Still, the EDPB is right to recognise that the application of the GDPR to AI models “raises systemic, abstract and novel issues” — issues that cannot be addressed in a single document. (Indeed, the Opinion does not engage at all with two of the most pressing issues for AI model development: sensitive personal data and purpose limitation.) And the huge variety of AI models and use cases, and the limited scope of the Opinion, means that it may have been wishful thinking to expect the EDPB to provide any type of roadmap for compliance.

With that context in mind, let’s turn to what the Opinion does say.

Can an AI Model be Anonymous?

Yes — in principle.

The EDPB confirms — albeit using the qualified phrasing described above — that AI models trained on personal data can be anonymous. In order for a model to be considered anonymous, both the likelihood of (i) direct (including probabilistic) extraction of individuals’ personal data that was used to train the model, and (ii) obtaining such personal data from “queries”, should be insignificant, taking into account all the means reasonably likely to be used by the model developer or another person.

In practice, this is likely to be a very high bar. Organisations wishing to take the position that their models are anonymous will need robust documentation — including of the technical and organisational measures taken throughout the lifecycle of the model — to support their claims.

The Opinion notably does not address — explicitly, at any rate — the view taken by some European data protection authorities that large language models do not store personal data. However, the fact that the EDPB accepts that not all AI models which are trained with personal data will be anonymous — including an assessment of whether its outputs contain personal data — suggests that it disagrees with this position. Given the degree to which the EDPB has put the onus on national regulators to interpret both the Opinion and the GDPR’s provisions in this respect, we may well see further differences in approach between authorities on one of the most fundamental questions in this space. The EDPB is also expected to produce (non-AI specific) guidelines on anonymisation in the coming months, which will make for particularly interesting reading in light of the Opinion.

Can Legitimate Interests be Used to Develop an AI Model?

Yes — in principle.

The Opinion does not state that legitimate interests cannot be an appropriate lawful basis for developing and deploying AI models. Rather, it uses a series of linguistic qualifiers to conclude that an organisation may have a legitimate interest in processing personal data for its AI model.

The EDPB reiterates the three-part test set out in Article 6(1)(f) of the GDPR for assessing whether an interest is legitimate, namely: (i) the pursuit of a legitimate interest by the controller or a third party; (ii) the processing of personal data is necessary to pursue the legitimate interest; and (iii) the interest is not overridden by the interests or fundamental rights and freedoms of the data subjects.

The Opinion provides three examples of processing that may constitute a legitimate interest: (i) developing a conversational agent to assist users; (ii) developing an AI system to detect fraudulent content or behaviour; and (iii) improving threat detection in an information system. However, the latter two examples are similar to the legitimate interests described in Recitals 47 and 49 of the GDPR, respectively, such that organisations hoping for a list of approved, AI-specific legitimate interests from the EDPB may well be disappointed.

By the same token, the Opinion does not explicitly state that particular processing activities are not legitimate — including, notably, AI models comprised of web-scraped data. Indeed, the Opinion lists specific mitigating measures for controllers to consider in the context of scraping, including:

Excluding content from publications that could entail risks for individuals were their information to be made public.
Excluding certain data categories or sources, such as from websites whose subject matter is particularly sensitive.
Excluding collection from websites that clearly object to scraping and the reuse of their content.

Given the multiple and overlapping considerations at issue when assessing legitimate interests in the context of AI development and deployment (including individuals’ reasonable expectations and mitigating measures, among other things), this is not the type of processing that should be shoehorned under legitimate interests in the absence of a more onerous basis — i.e., consent — or another lawful basis entirely. Rather, reliance on legitimate interests will require a thorough assessment and balancing — and documentation — of the respective interests.

Are Users of AI Models Exposed to Liability?

Many organisations do not — for now — develop AI models, and so could be forgiven for thinking that the Opinion is not relevant for them. Importantly, the EDPB makes clear that the use of AI models whose development involved unlawful processing of personal data may expose to liability both the developing party controller and the deploying party controller.

Where that is the case, national regulators will be expected to take into account whether the party acquiring the AI model conducted an “appropriate assessment” to determine that the model was not developed by unlawfully processing personal data. As such, the Opinion serves as an important reminder for deployers of AI tools: while the use of those tools by their employees is usually thought of as being the primary risk to organisations (and indeed it often is), they should not overlook the need to understand the risks inherent in the AI model itself and confirm, to the best of their ability, that those risks have been addressed.

It is reasonable to assume that regulatory investigations will generally focus on — or at least start with — the developer of an AI model. However, given the patchwork of approaches that could be taken by authorities across the European Union, it is certainly possible, given the right fact pattern, that users of such models also or alternatively receive scrutiny. For example, where a deployer modifies a white label solution and becomes a quasi deployer-developer. Or if a deployer’s use of a standalone model results in particularly egregious risks to or outcomes for individuals.

Conducting and documenting diligence should therefore form a critical part of your AI contracting playbook (if it is not already). This exercise can be easier said than done, particularly when acquiring off-the-shelf tools or where the acquiring party has limited bargaining power. But doing so will in most cases help to act as mitigation in the event of regulatory intervention — as well as helping you to assess and comply with your wider data protection obligations.

Subscribe to Ropes & Gray Viewpoints by topic here.