Encyclopaedia Britannica Sues OpenAI — Alleges AI 'Memorized' Copyrighted Text
The publisher of the iconic encyclopedia and dictionary joins a growing list of media and content companies taking legal action against OpenAI, claiming its large language models were trained on their work without permission or compensation.

Key Takeaways
- Encyclopaedia Britannica and its subsidiary Merriam-Webster are suing OpenAI for copyright and trademark infringement.
- The lawsuit alleges that OpenAI's GPT models have 'memorized' copyrighted content and can reproduce it in a 'substantially similar' form.
- This legal action is part of a broader trend of publishers and content creators challenging the data training practices of AI labs.
- The case centers on the core question of whether training AI on publicly available but copyrighted material constitutes fair use.
Encyclopaedia Britannica and dictionary publisher Merriam-Webster have filed a lawsuit against OpenAI, accusing the AI firm of using their copyrighted works to train its large language models. The suit, first reported by Reuters, claims that OpenAI's models do more than just learn from the reference materials; they have effectively 'memorized' the content and can reproduce it in ways that are 'substantially similar' to the original, according to The Verge.
Another Publisher Joins the Legal Fray
The lawsuit, filed in federal court, alleges both copyright and trademark infringement. According to the complaint, OpenAI repeatedly copied Britannica and Merriam-Webster's content without permission to build its popular ChatGPT service. All three sources—Fast Company, The Verge, and Engadget—confirm the core allegations. The crux of Britannica's argument is that the outputs from OpenAI's models are not transformative but derivative, competing directly with the publisher's own digital products.
This is not a novel argument, but it comes from one of the most established brands in reference publishing. The lawsuit's claim that GPT-4 has 'memorized' vast portions of their work, as reported by The Verge, goes to the heart of the legal debate surrounding generative AI. If a model is simply regurgitating training data on command, the case for it being a new, transformative work under fair use doctrine becomes significantly weaker.
A Pattern of Copyright Challenges
Britannica is not alone. OpenAI is facing a barrage of similar lawsuits from a wide range of content creators, including The New York Times, The Intercept, and authors like George R.R. Martin. These cases collectively represent a fundamental challenge to the 'scrape everything' model of AI development that has allowed companies like OpenAI to scale so rapidly. The publishers argue that their significant investment in creating, fact-checking, and curating information is being exploited without compensation to build a competing product.
The pattern indicates that the era of AI labs training models on the public internet with impunity is drawing to a close. The legal and financial risks are mounting. While AI companies have long operated under the assumption that their training processes are protected by fair use, that assumption is now being tested by organizations with the resources to see the fight through. This isn't just about a single encyclopedia; it's about whether the foundational data that powers the current AI boom was legally obtained.
SignalEdge Insight
- What this means: The legal definition of 'fair use' for AI training is being aggressively challenged by legacy content owners, forcing a potential industry-wide reckoning on data sourcing.
- Who benefits: Copyright law firms and rival AI companies that have proactively pursued licensing deals for their training data.
- Who loses: OpenAI, which faces escalating legal costs and the risk of a precedent-setting judgment that could invalidate core parts of its training data.
- What to watch: The discovery phase of The New York Times lawsuit against OpenAI, which could reveal exactly how much copyrighted material was used and how it is processed by the models.
Sources & References
Stay ahead of the curve
Get the most important stories in tech, business, and finance delivered to your inbox every morning.


