Generating discussion: Data protection and AI
The Information Commissioner’s Office (ICO) has launched a consultation series on how aspects of data protection law should apply to the development and use of generative AI models. In this article, I will summarise the first two consultations and, where applicable, share MRS’ response to the ICO consultations.
Generative AI refers to AI models that can create new content – for example, text, computer code, audio, music, images and videos. Typically, these models are trained on extensive datasets, which allows them to exhibit a broad range of general-purpose capabilities.
Generative AI can autonomously generate several types of new outputs. One dataset could be used for improving customer interactions through enhanced chat and search experiences, or assist with repetitive tasks such as replying to requests for proposals. The first call for evidence from the ICO addressed the lawful basis for web scraping to train generative AI models.
What is web scraping?
Web scraping involves the use of automated software to ‘crawl’ web pages, gather, copy or extract information from those pages, and store that information (such as in a database) for further use. The information can be anything on a website, including images, videos, text or contact details.
Information scraped from internet environments such as blogs, social media, forum discussions, product reviews and personal websites may contain personal data that individuals have placed there. Additionally, these may contain information that was not placed there by the person to whom it relates (such as discussion forums, leaked information).
In the first consultation, the ICO has analysed whether legitimate interest (LI) is a valid lawful basis for training generative AI models on web-scraped data. In the ICO’s view,
LI can be a valid lawful basis for training generative AI models on web-scraped data, but only when the model’s developer can ensure they pass the three-part test:
- Is there a valid interest?
- Is web scraping necessary, given the purpose?
- Do individuals’ rights override the interest of the generative
AI developer?
These are considerations controllers and developers will need to apply, and this will allow developers to thoroughly consider the appropriateness of the processing and consider its risks and whether it infringes on individual rights.
Purpose limitation in the generative AI life-cycle
The second chapter focuses on how the data protection principle of purpose limitation should be applied at different stages in the generative AI life-cycle. This requires organisations to be clear and open about why they are processing personal data, and to ensure that what they intend to do with it is in line with individuals’ reasonable expectations. There must be a lawful basis for processing data, and the purpose must not be in breach of other laws.
The generative AI model life-cycle involves several stages. Each stage may involve processing different types of personal data for different purposes. This can make it challenging for developers and controllers to set a singular purpose or have the purpose clear at the outset of the project, as well as to clarify the delineation of roles between controllers and processors.
However, the expectation from the ICO is that this needs to be well considered, although different organisations may have control over these purposes, helping to delineate the boundaries of purposes.
Nonetheless, having a specified purpose in each stage of the generative AI life-cycle is essential and will allow organisations to appropriately understand the scope of each processing activity, mitigate risk, evaluate its compliance with data protection and help them evidence that.
The ICO is certainly at the inception stages of its analysis around regulating generative AI, and is seeking input from industry to inform future planning and decision making.
MRS has submitted a response to the second chapter of the ICO consultation and will continue to input into later chapters. At present, our response is seeking more clarity from the ICO.
In our response, we asked the ICO to consider and address three key concerns and questions, including:
- The exemptions provided in Article 89 and Recital 162 of the General Data Protection Regulation. In essence, these provisions mean that if research is conducted for historical, scientific, or statistical purposes, while safeguards
must be ensured, it reduces the onerous burdens for
further processing. - With regards to the reuse of data gathered for research purposes, if research data is gathered for one research project and client, and the data is used to train research models – for example, create synthetic data models, which may be used repeatedly for other research projects, is it reasonable to gather consent for ‘research purposes’ without having to explicitly state the specific type of research projects that the data may be used for in the future?
- And finally, will the ICO provide guidance for determining when users of generative AI model data become controllers and when they are processors? Users seldom have access to identifiable personal data; this data is only accessible to the owner of a generative AI model. Therefore, if models are used, will the users always be processors, or will there be instances where the users could be joint controllers?
We hope to gain more understanding from the ICO and contribute to the ongoing discussions on regulating generative AI. It is imperative that there are clear considerations and planning around the delineation of responsibilities (this is already a challenging matter across the industry), and that there is clarity around the exemptions that should be provided in accordance with Article 89 and Recital 162 of the GDPR, and the reuse
of data.
This article was first published in the July issue of Impact.

We hope you enjoyed this article.
Research Live is published by MRS.
The Market Research Society (MRS) exists to promote and protect the research sector, showcasing how research delivers impact for businesses and government.
Members of MRS enjoy many benefits including tailoured policy guidance, discounts on training and conferences, and access to member-only content.
For example, there's an archive of winning case studies from over a decade of MRS Awards.
Find out more about the benefits of joining MRS here.
0 Comments