How the fair use clause is being applied to generative AI

Home » How the fair use clause is being applied to generative AI

Context & Relevance

Access to diverse and voluminous training data (books, articles, web content) is central to improving Large Language Models (LLMs).
This includes both public domain and copyrighted works—raising significant legal and ethical issues when used without permission.

Relevance : GS 3(IPR , Technology)

Central Legal Issue

Key Question: Does using copyrighted material for LLM training—without authorisation—constitute copyright infringement?
In the U.S., this hinges on whether the use qualifies as “fair use” under Section 107 of the Copyright Act.

Fair Use Doctrine – Four Factors

Courts evaluate fair use claims based on:

Purpose & Character: Is the use transformative (e.g., generating new knowledge vs reproducing existing works)?
Nature of Work: Factual works are more likely to be fair use than fictional/creative ones.
Amount & Substantiality: How much of the original was used?
Market Effect: Does the use harm the original’s market or potential licensing revenue?

Case 1: Anthropic PBC (Claude LLM)

Used copyrighted books—some legally purchased, some from questionable sources—to train its GenAI.
Court ruling:
- Training with legally purchased books = Fair Use (due to transformative use).
- Copying from illegal sources = Not fair use ; court refused to grant blanket protection.
Key takeaway: Court distinguishes between transformative use and how the data was acquired.

Case 2: Meta (LLaMA LLM)

Sued by 13 authors for using illegally sourced books for training.
Court ruling:
- Training = Fair Use (highly transformative).
- Plaintiffs failed to prove market harm with empirical data.
- Court did not penalise unauthorised downloading as a separate infringement (unlike Anthropic case).
Judge acknowledged “market dilution” concern but said proof of harm was lacking.

Comparison: Anthropic vs Meta

Factor	Anthropic	Meta
Transformative Use	Recognised	Recognised
Market Harm	Downplayed	Downplayed but noted future risks
Illegal Sourcing	Treated as separate infringement	Not distinctly addressed
Judgement Focus	Data sourcing and use	Final use only

Precedent Case: Thomson Reuters v. Ross Intelligence

Court held no fair use because AI simply retrieved legal opinions (not transformative).
Also competed directly with plaintiff’s product—thus hurting the market.

Emerging Legal Standards

Courts seem to support transformative use in GenAI training—tilting toward fair use.
But evidence of market harm will be crucial in future cases.
Use of illegally sourced data may be treated as a separate violation—creating liability even if training is transformative.

Challenges for Plaintiffs

Hard to prove “market substitution” or “licensing market harm.”
LLM outputs are often not reproductions, but generated content—making infringement indirect and difficult to establish.

Implications Going Forward

Unsettled legal landscape: Outcomes will vary case-by-case, based on data sourcing, model purpose, and market effects.
Need for clearer copyright licensing frameworks and/or legislative clarity.
Future rulings may hinge on empirical studies, including AI impact on creative economies.