Can AI Models Use Copyrighted Content for Training?
- Training AI models involves large-scale ingestion of data from across the internet, including:
- Public domain content (free to use)
- Copyrighted material, which raises legal and ethical concerns
- The key legal question: Does using copyrighted data for training constitute copyright infringement?
- Fair use doctrine (U.S.) and text and data mining exceptions (EU, U.K.) are invoked to justify such use
- But unauthorised data scraping or pirated content remains a grey area with potential liability
Relevance : GS 3(IPR , AI Technology)
Key U.S. Court Judgments (2025)
Thomson Reuters v. Ross Intelligence
- Ruled that AI training can be transformative and qualify for fair use
- Recognised the right to learn from copyrighted works as part of AI development
Bartz v. Anthropic
- Judge William Alsup ruled:
- Training using copyrighted works was transformative (like human learning)
- BUT, use of pirated content requires trial – fair use does not cover illegal sourcing
Kadrey v. Meta
- Judge Vince Chhabria ruled in Meta’s favour:
- Plaintiffs failed to prove market harm
- Considered Meta’s AI use of copyrighted works under fair use
- Monetization of AI models was acknowledged but not penalised under current law
Legal Distinction: Public Domain vs Copyrighted Content
Criteria | Public Domain | Copyrighted Material |
Usage by AI | Freely allowed | Needs permission or fair use defence |
Ownership Issues | No ownership | Owned by author/creator |
Legal Risks | None | Possible infringement, market dilution |
Fair Use Defence Needed? | No | Yes, if used without licence |
Implications for India’s IP Framework
- Copyright Act, 1957:
- Section 14: Grants exclusive rights to reproduce, adapt, and communicate work
- Section 52: Lists “fair dealing” exceptions (not identical to U.S. “fair use”)
- No AI-specific copyright provisions, but courts may interpret existing law to cover AI training
- India recognises legal persons (e.g. companies) as authors in certain IP cases, but AI-generated content’s authorship remains unclear
- Enforcement includes civil and criminal remedies for infringement, including digital piracy
- ANI vs OpenAI case may shape India’s policy stance on AI and copyright soon
Global Regulatory Ambiguity
- No harmonised international framework yet on AI and IP
- Differences in interpretation across jurisdictions (U.S., EU, India, U.K.)
- Key issues lacking clarity:
- Who owns AI-generated content?
- Can data mining for AI be exempt from infringement?
- Does AI output qualify as “original work” under IP law?