Artificial intelligence: courts weigh in on clash with creatives over copyright
Recent court rulings have started to define the legal boundaries for AI training and the intellectual property rights of the creative industries. The rulings have raised the question of whether a balance can be struck between the needs of the two sectors. Judgments have also focused attention on the obstacles facing the creative industries as they seek to protect revenues for their work.
In November, the UK High Court dismissed copyright claims brought by media company Getty Images against developer Stability AI relating to the training and output of its model, Stable Diffusion. Getty dropped its claims of primary copyright infringement after it was unable to provide evidence of unauthorised copying in the UK. Nick White, Vice-Chair of the IBA Copyright and Entertainment Law Subcommittee, says this highlights the difficulty of reaching a judgment on such claims in the country because ‘all the major AI models […] are not trained here.’
The remaining allegation was of secondary copyright infringement from the download and distribution of Stable Diffusion in the UK. Getty alleged that Stable Diffusion was an ‘infringing copy’, that the creation of the AI model’s internal data – its ‘weights’ – would have constituted copyright infringement if carried out in the UK. The Court ruled however that the ‘weights’ themselves aren’t infringing copies because they don’t store the original images – only patterns and features learned from the training data.
The Court did find Stability AI liable for limited trademark infringement resulting from the watermarks in images that certain versions of Stable Diffusion can generate. Getty hasn’t confirmed an appeal but said it would ‘be taking forward findings of fact from the UK ruling’ in its ongoing case in California.
There will be some form of meeting in the middle, perhaps a collecting society-type model
Nick White
Vice-Chair, IBA Copyright and Entertainment Law Subcommittee
The judgment ‘is very interesting because it shows how the technicalities of different generative AI technologies can mean that cases go in different ways,’ says Alina Trapova, a lecturer in law at University College London and specialist in AI, IP and the creative industries. She highlights a judgment against OpenAI in November by the Munich Regional Court. In the case, Germany’s music rights society GEMA claimed that ChatGPT had harvested protected lyrics by popular artists to train its language models. ChatGPT’s developer, OpenAI, said its chatbot didn’t store specific training data but only statistical knowledge in its parameters and that any infringements were caused by user prompts.
The judge ruled that AI training constitutes ‘reproduction’ under German copyright law. The Court said that even a ‘fixation’ of copyrighted works – ie, an original idea or expression captured in a tangible, permanent form – in the AI model’s numerical probability values qualifies as a reproduction, as long as the work can later be perceived through technical means.
The Court also found that ChatGPT reproduces complete training data (‘memorisation’), which means it falls outside EU text and data mining exceptions, which permit the use of copyrighted works to train AI models only in certain circumstances, provided rightsholders have the option to opt out. OpenAI is considering an appeal.
In the US, the Bartz v Anthropic class action saw a group of authors and publishers sue the AI company for allegedly infringing copyright by using material from pirated books to train its Claude large language models (LLMs). The Court ruled that Anthropic’s use of books to train its AI system without permission was fair use and didn’t breach copyright law. However, the judge said that copying and storing over seven million pirated books in a central library did infringe copyright. Anthropic has agreed to pay a $1.5bn settlement.
In Kadrey v Meta, a group of authors claimed the tech company had infringed on their copyright by training its Llama LLMs on their work. Judge Vince Chhabria ruled that, in the absence of meaningful evidence of market dilution – ie, an indirect market harm whereby the AI generates ‘new’ but competitive content that substitutes human-created works – from the authors, the copying and training were fair use.
Although Judge Chhabria acknowledged that no other copyright use ‘has anything near the potential to flood the market with competing works the way that LLM training does,’ he said the weakness of the evidence presented by the plaintiffs meant this failed to ‘move the needle.’
‘The content industry’s current weakness relates to evidence,’ says Daniela De Pasquale, an officer of the IBA Technology Law Committee. She explains that none of the major rulings against plaintiffs to date deny the existence of damage to the content industry, or the fact that works were copyrighted. Rather, the rulings were made simply due to a lack of evidence of copyright infringement.
A distinction should be made between AI input and output, adds De Pasquale, who’s a partner at Ughi e Nunziante in Milan. ‘The smoking gun is always the output. If the output is the full lyrics of a song or a passage from a book, then there must have been a corresponding input in which case unauthorised training has taken place and there is evidence of copyright infringement.’ However, often it’s not possible to identify the output as a copy, perhaps because filters have been used to avoid it, ‘making it harder to provide sufficient evidence of copyright infringement,’ she says.
There’s a question as to whether it’s possible to strike a balance between the needs of AI developers and ensuring copyright holders are remunerated for their effort. White, who’s a partner at law firm Level in London, believes ‘there will be some form of meeting in the middle, perhaps a collecting society-type model,’ whereby an organisation licenses copyrighted works on behalf of its members.
The history of tech innovation also indicates that cooperation is possible, says De Pasquale, noting that litigation cases against peer-to-peer platforms in the 2000s led to the formation of a music streaming industry where tech innovators and music labels work together. ‘My personal feeling is that something similar will happen with Generative AI,’ she says.
Stronger transparency rules in relation to AI training could also help prevent costly legal battles in the future. The EU’s AI Act requires GenAI companies to design models to prevent the generation of illegal content and to publish summaries of copyrighted data used for training. Most major EU and US developers have signed up to the former’s code of practice for GenAI, which sets out commitments on transparency, copyright and safety.
Header image: Dee karen/Adobe Stock