Web scraping protection: How companies can protect their online content from being used by AI providers
© everythingpossible – stock.adobe.com

Web scraping protection: How companies can protect their online content from being used by AI providers

3 min.

Providers and developers of (mostly general) AI tools often train their systems with data that is subject to copyright or database rights of third parties. How can they be prevented from using the authors’ online content for AI training against their will? Is web scraping permitted under European law? The Ecovis experts answer these and other questions.

European legislation

The new European AI Act emphasises (recital 108) that the AI Regulation does not affect the enforcement of copyright rules as provided for under Union law. On this basis, it could be assumed that copyright-protected works or databases – even if published online – are therefore also protected against reproduction by AI developers who “scrape” content from the internet, as long as the rightsholders have not given permission (granted a licence) to copy those works or databases as training material for AI tools. However, this is a misconception; in 2019, an important exception to this old intellectual property law principle was introduced in European legislation on copyright and related rights in the Digital Single Market (DSM), namely that (in short) text and data mining of protected material made publicly available online is permitted for commercial purposes, unless the rightsholder has expressly reserved such rights in an appropriate manner.

Machine-readable methods (for example by including rules in a robots.txt file that scraping tools can understand) are considered appropriate in this regard. If rightsholders do not make such a reservation or do not make it in an appropriate manner, they run the risk of no longer being able to successfully take action against third parties who have lawful access to their online content and make reproductions of it for text and data mining purposes.

We can help you protect your copyrights and train your AI legally.
Lesley Broos, Lawyer, Partner, Kienhuis Legal – Member of ECOVIS International, Enschede, Netherlands

Differences among Member States

The relevant European directive has been implemented in the national legal systems of the EU Member States and is therefore not yet fully harmonised. To find out how intellectual property can be effectively protected in the various European Member States, the Ecovis consultants recommend that those affected consult a specialist. This of course also applies to developers who want to legally train their AI tools using third-party sources.

Copyright infringement: Pending procedures

Lawsuits against providers and developers of AI tools are ongoing worldwide. For example, in December 2023, The New York Times initiated proceedings against OpenAI and its partner Microsoft because OpenAI allegedly used millions of news articles from The New York Times without permission to train its AI system. A similar case is also pending against the competing AI tools of Google / Alphabet (Bard, Imagen, MusicLM, Duet AI & Gemini).

Scraping personal data in addition

While training AI-systems, personal data protection laws also need to be considered. If the online content in question also contains personal data, that can also be problematic in terms of web scraping. It is not without reason that the Dutch Data Protection Authority earlier this year informed that scraping content containing personal data is ‘almost always illegal’, as in most cases, informed consent from the ‘scraped’ data subjects would be legally required – and such consent will normally not be present.

For further information please contact:

Lesley Broos, Lawyer, Partner, Kienhuis Legal – Member of ECOVIS International, Enschede, Netherlands
Email: lesley.broos@kienhuislegal.nl

Sign up to our newsletter!

Contact us:

Lesley Broos
Kienhuis Legal NV – Member of ECOVIS International
Pantheon 25
7521 PR Enschede
Phone: +31 88 480 40 00
www.ecovis.com/netherlands/legal