Meta Wins Case Over Its Use of Copyright-Protected Content to Train AI

One of the most significant (yet less flashy) considerations of the new wave of generative AI tools is the copyright implications of such, both in terms of usage (can you own the rights to an AI-generated work?) and generation (are AI projects stealing artists’ work?).
And both, at least at present, fall into somewhat awkward legal territory, because copyright laws, as they exist, haven’t been designed to cater to AI content. Which means that, technically, it remains difficult to prosecute, on either front.
Today, Meta has had a big court win on this front, with a federal judge ruling that Meta did not violate copyright law in training its AI models on original works.
Back in 2023, a group of authors, including high-profile comedian Sarah Silverman, launched legal action against both Meta and OpenAI over the use of their copyrighted works to train their respective AI systems. The authors were able to show that these AI models were capable of reproducing their work in highly accurate form, which they claim demonstrates that both Meta and OpenAI used their legally protected material without consent. The lawsuit also alleges that both Meta and OpenAI removed the copyright information from their books to hide this infringement.
In his assessment, Judge Vince Chhabria ruled that, for Meta’s part at least, the company’s use of these works is for “transformative” purpose, and that Meta’s tools are not designed to re-create competing works.
As per the judgment:
“The purpose of Meta’s copying was to train its LLMs, which are innovative tools that can be used to generate diverse text and perform a wide range of functions. Users can ask Llama to edit an email they have written, translate an excerpt from or into a foreign language, write a skit based on a hypothetical scenario, or do any number of other tasks. The purpose of the plaintiffs’ books, by contrast, is to be read for entertainment or education.”
So the argument then is that this case is seeking to prosecute the tool, not the purpose. As a basic comparison, knives can kill people, but you can’t take legal action against a knife maker for providing a harmful tool in the case of a murder, as it has been used in a way that it was not intended for.
As such, the judge ruled that because the re-use of the works was not intended to create a competing market for these works, the application of “fair use” in this case applies.
But there are a lot of provisos in the ruling.
First, the judge notes that the case “presented no meaningful evidence on market dilution at all,” and without having that element spelled out in the arguments, Meta’s defense that it can use these works under fair use is applicable.
So the case presented the argument that this should not be allowed, but showed no examples of how Meta’s tools had led to a re-creation that had harmed the plaintiff’s earnings.
Just judge also notes that:
“In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use. No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books. And some cases might present even stronger arguments against fair use.”
So essentially, the judge is saying that on a smaller, case-by-case scale, where a claimant can demonstrate that their work has been replicated by AI, and that they’ve been harmed by such, the creator will probably win, but on a broad scale, without difrect evidence of such harm, such use is not a violation in itself.
So while it may seem like a blow for artists, enabling generative AI projects to essentially steal their work for their own purpose, the judge is really saying that there is likely a legal basis for such enforcement, that would apply in many cases, and would potentially enable artists to argue that AI replication is in violation of copyright. But this particular case hasn’t made it.
Which is not ideal for artists seeking legal protection against generative AI projects, and unlicensed usage of their work, with this case serving as a litmus test, of sorts, on such parameters.
Last week, a federal judge ruled in favor of Anthropic in a similar case, which essentially enables the company to continue training its models on copyright-protected content.
The sticking point here is the argument of “fair use,” and what constitutes “fair” in the context of re-use for alternative purpose. Fair use law is generally designed to apply to journalists and academics, in reporting on material that serves an educational purpose, even if the copyright holder may disagree with that usage.
Do LLMs, and AI projects, fall into that same category? Well, under the legal definition, yes, because the intent is not to re-create such work, but to facilitate new usage based on elements of it.
And again, in that sense, an individual artist would be better placed to win a case where an AI work has clearly replicated theirs, though that replication would have to be indisputably clear, and there would also, presumably, have to be a level of benefit gleaned by the AI creator to justify such.
Also, people can’t copyright AI-generated works, which is another wrinkle in the AI legality puzzle.
There’s also a whole other element in both of these cases which relates to how Meta and Anthropic accessed these copyright-protected materials in the first place, amid claims that these works were stolen off dark web databases for mass-training. None of those claims have been proven as yet, though that’s a separate factor which relates to a different type of content theft.
So where do we stand on legal use of generative AI content?
Well on a broad scale, given the intended use of AI models, it does seem that AI projects are able to use content that they can legally access under fair use provisions, as they’re not intended to facilitate replication, and harm their respective industries.
But they will be used for such, and they will harm industries. The judge in this case has made this very point, but his ruling in this instance relates to the specifics of the case presented.
Essentially, existing copyright laws were not designed with AI generation in mind, and we don’t have any established precedents on the specifics of what should and should not be allowed in model training. Maybe, for example, in future, AI models will be required to get explicit copyright-holder permission, or limit the use of prompts that lead to replication, or maybe AI providers will have to provide details of user prompts in legal filings to enforce copyright in certain cases.
There’s more to be worked out, and this case hasn’t delivered clarity on those specifics.
Source link