Description

Copyright infringement not intended

Picture Courtesy: INDIAN EXPRESS

Context:

Rise of generative Artificial Intelligence (AI) models, such as ChatGPT and Gemini, has started a legal and ethical debate: are these powerful AI systems built on creative work stolen from authors, artists, and musicians?

About Generative AI and the Copyright Conflict

Generative AI models learn from vast datasets to produce new content, from text and images to music. Their ability to generate human-like outputs depends entirely on the quality and quantity of the data they are trained on, which includes copyrighted material found across the internet.

How AI Models Learn? => Generative AI models identify patterns and relationships within massive amounts of training data, allowing them to create novel outputs in response to user prompts.

Plaintiffs' Argument => Writers, music labels, news agencies, and artists argue that training AI models on their copyrighted works without permission or compensation. They claim that AI-generated content directly competes with and weakens the market for their original creations, which impact their livelihoods.

Tech Companies' Defence => Tech companies claim that their use of copyrighted material falls under "fair use," a legal doctrine that permits limited use of copyrighted works for purposes like criticism, comment, news reporting, teaching, scholarship, or research. They argue that AI models create "transformative" works, meaning they process the original data to produce something new and different, rather than merely reproducing it.

Recent US Court Verdicts

Recently two Judgments in US courts have sided with tech companies, presenting the first judicial interpretations of this complex issue. However, these rulings are not outright victories for AI firms, as questions regarding the use of pirated data continue.

Case 1: Writers v/s Anthropic

In August 2024, journalist-writers Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a class-action lawsuit against Anthropic, the developer of the Claude family of Large Language Models (LLMs).

Plaintiffs' Claim => The writers alleged that Anthropic downloaded and used pirated versions of their books from "Books3"— an online shadow library containing millions of pirated copies—to train its LLMs. They argued that Anthropic neither compensated the authors nor prevented its AI from generating texts that writers would otherwise be paid to create.

Anthropic's Defence => While Anthropic did use pirated libraries like Books3, it also claimed to have invested millions in purchasing and digitally scanning copyrighted books for its "research library."

Judgement => Court ruled that Anthropic's use of copyrighted data was "fair use." It underlined the "transformative" potential of AI, commenting that AI models train upon works "not to race ahead and replicate or supplant them — but to turn a hard corner and create something different." The judge also found that the copying and storage of pirated books to train LLMs did infringe copyright.

Case 2: Writers v/s Meta

Thirteen published authors filed a class-action suit against Meta, the company behind the Llama LLMs.

Plaintiffs' Claim => Similar to the Anthropic case, the authors contended that Meta's Llama LLMs "copied" vast amounts of their copyrighted text, with AI responses essentially derived from this training data.

Meta's Defence => Meta admitted to training its models on shadow libraries like Books3, Anna's Archive, and Libgen. However, it argued that it "post-trained" its models to prevent them from "memorizing" and outputting copyrighted material. Meta claimed its models could not generate more than 50 words from the plaintiffs' books.

Judgement => Court ruled that the plaintiffs failed to prove that Llama's outputs diluted their markets. He explained that if an LLM were to generate endless biographies from copyrighted ones, it would harm the market, but this has not yet been established.

Bigger Picture

Piracy Remains an Issue => Both rulings indicate that while the act of training might be considered "fair use" for transformative purposes, the source of the training data—especially if pirated—can still constitute copyright infringement, leading to potential damages.

Ongoing Lawsuits => The legal landscape remains highly active:

At least 21 ongoing lawsuits are filed in the US by various creators against tech companies.
Twelve separate copyright lawsuits, including a high-profile case by The New York Times, against OpenAI and Microsoft are now consolidated into a single case.
Visual artists are suing image-generating tools like Stability AI, Runway AI, Deviant Art, and Midjourney for training on their copyrighted art.
Getty Images is suing Stability AI for using over 12 million of its photographs without permission.

Indian Context => India is also witnessing similar legal challenges. In 2024, news agency ANI filed a case against OpenAI for unlawfully using Indian copyrighted material. The Digital News Publishers Association (DNPA), representing major Indian media houses like The Indian Express, Hindustan Times, and NDTV, has joined these proceedings. This indicates that AI and copyright will be a major issue in India as well.

Must Read Articles:

Generative AI (GenAI)

Generative AI vs Copyright Law

Source:

INDIAN EXPRESS

Attempt Daily Quiz

PRACTICE QUESTION

Q. "Artificial Intelligence presents both a transformative opportunity and a profound challenge to humanity." critically analyze. 150 words