US courts ruled that training AI models by copying copyrighted books is “transformative” and qualifies as fair use. Anthropic and Meta secured wins, though allegations of pirating remain subject to further legal scrutiny. With over 21 related lawsuits underway, the rulings provide partial clarity but leave unresolved questions about market impact and creator compensation.
Copyright infringement not intended
Picture Courtesy: INDIAN EXPRESS
Rise of generative Artificial Intelligence (AI) models, such as ChatGPT and Gemini, has started a legal and ethical debate: are these powerful AI systems built on creative work stolen from authors, artists, and musicians?
Generative AI models learn from vast datasets to produce new content, from text and images to music. Their ability to generate human-like outputs depends entirely on the quality and quantity of the data they are trained on, which includes copyrighted material found across the internet.
How AI Models Learn? => Generative AI models identify patterns and relationships within massive amounts of training data, allowing them to create novel outputs in response to user prompts.
Plaintiffs' Argument => Writers, music labels, news agencies, and artists argue that training AI models on their copyrighted works without permission or compensation. They claim that AI-generated content directly competes with and weakens the market for their original creations, which impact their livelihoods.
Tech Companies' Defence => Tech companies claim that their use of copyrighted material falls under "fair use," a legal doctrine that permits limited use of copyrighted works for purposes like criticism, comment, news reporting, teaching, scholarship, or research. They argue that AI models create "transformative" works, meaning they process the original data to produce something new and different, rather than merely reproducing it.
Recently two Judgments in US courts have sided with tech companies, presenting the first judicial interpretations of this complex issue. However, these rulings are not outright victories for AI firms, as questions regarding the use of pirated data continue.
Case 1: Writers v/s Anthropic
In August 2024, journalist-writers Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a class-action lawsuit against Anthropic, the developer of the Claude family of Large Language Models (LLMs).
Plaintiffs' Claim => The writers alleged that Anthropic downloaded and used pirated versions of their books from "Books3"— an online shadow library containing millions of pirated copies—to train its LLMs. They argued that Anthropic neither compensated the authors nor prevented its AI from generating texts that writers would otherwise be paid to create.
Anthropic's Defence => While Anthropic did use pirated libraries like Books3, it also claimed to have invested millions in purchasing and digitally scanning copyrighted books for its "research library."
Judgement => Court ruled that Anthropic's use of copyrighted data was "fair use." It underlined the "transformative" potential of AI, commenting that AI models train upon works "not to race ahead and replicate or supplant them — but to turn a hard corner and create something different." The judge also found that the copying and storage of pirated books to train LLMs did infringe copyright.
Case 2: Writers v/s Meta
Thirteen published authors filed a class-action suit against Meta, the company behind the Llama LLMs.
Plaintiffs' Claim => Similar to the Anthropic case, the authors contended that Meta's Llama LLMs "copied" vast amounts of their copyrighted text, with AI responses essentially derived from this training data.
Meta's Defence => Meta admitted to training its models on shadow libraries like Books3, Anna's Archive, and Libgen. However, it argued that it "post-trained" its models to prevent them from "memorizing" and outputting copyrighted material. Meta claimed its models could not generate more than 50 words from the plaintiffs' books.
Judgement => Court ruled that the plaintiffs failed to prove that Llama's outputs diluted their markets. He explained that if an LLM were to generate endless biographies from copyrighted ones, it would harm the market, but this has not yet been established.
Piracy Remains an Issue => Both rulings indicate that while the act of training might be considered "fair use" for transformative purposes, the source of the training data—especially if pirated—can still constitute copyright infringement, leading to potential damages.
Ongoing Lawsuits => The legal landscape remains highly active:
Indian Context => India is also witnessing similar legal challenges. In 2024, news agency ANI filed a case against OpenAI for unlawfully using Indian copyrighted material. The Digital News Publishers Association (DNPA), representing major Indian media houses like The Indian Express, Hindustan Times, and NDTV, has joined these proceedings. This indicates that AI and copyright will be a major issue in India as well.
Must Read Articles:
Generative AI vs Copyright Law
Source:
PRACTICE QUESTION Q. "Artificial Intelligence presents both a transformative opportunity and a profound challenge to humanity." critically analyze. 150 words |
© 2025 iasgyan. All right reserved