Court Dismisses Authorsâ€™ Copyright Infringement Claims Against OpenAI

Bitcoin Warrior 2024-02-14

In recent months, rightsholders of all ilks have filed lawsuits against companies that develop AI models.

The list includes record labels, individual authors, visual artists, and even the New York Times. These rightsholders all object to the presumed use of their work without proper compensation.

Several of the lawsuits filed by book authors include a piracy component. The cases allege that tech companies, including Meta and OpenAI, used the controversial Books3 dataset to train their models.

The Books3 dataset was created by AI researcher Shawn Presser in 2020, who scraped the library of â€˜pirateâ€™ site Bibliotik. The general vision was that the plaintext collection of more than 195,000 books, which is nearly 37GB in size, could help AI enthusiasts build better models.

The vision wasnâ€™t wrong; large text archives are great training material for Large Language Models, but many authors disapprove of their works being used in this manner, without permission or compensation.

Authors Sue, OpenAI Responds

In a lawsuit filed last June, authors Paul Tremblay and Mona Awad accused OpenAI of direct and vicarious copyright infringement, among other things. Soon after, writer/comedian Sarah Silverman teamed up with authors Christopher Golden and Richard Kadrey in an identical suit.

The complaints allege that the authorsâ€™ books were sourced from pirate sites. They specifically mention the controversial Books3 dataset, as well as data from other shadow libraries such as LibGen, Z-Library, and Sci-Hub.

â€œThe books aggregated by these websites have also been available in bulk via torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training community..,â€ the authors wrote.

OpenAI didnâ€™t deny these allegations directly but nevertheless disagreed that using books to train AI amounts to vicarious copyright infringement or violations of the DMCA.

In a motion to dismiss, OpenAI asked the California federal court to â€˜trimâ€™ the scope of the case. The only claim that should be able to survive is direct copyright infringement, but OpenAI said it expects to defeat that at a later stage.

Court Dismisses Copyright and DMCA Claims

After reviewing input from both sides, California District Judge Araceli MartÃnez-OlguÃn ruled on the matter. In her order, she largely sides with OpenAI.

The vicarious copyright infringement claim fails because the court doesnâ€™t agree that all output produced by OpenAIâ€™s models can be seen as a derivative work. To survive, the infringement claim has to be more concrete.

â€œPlaintiffsâ€™ allegation that â€˜every output of the OpenAI Language Models is an infringing derivative workâ€™ is insufficient. Plaintiffs fail to explain what the outputs entail or allege that any particular output is substantially similar â€“ or similar at all â€“ to their books,â€ the order reads,

In addition to copyright infringement, the authors accused OpenAI of violating the DMCA by intentionally altering the copyright management information (CMI). Details such as the title, the author, and the copyright owner, were allegedly stripped to â€œenableâ€ or â€œconcealâ€ infringement.

Judge MartÃnez-OlguÃn sees no evidence for the intentional removal of this copyright information. And, even if these allegations are true, thereâ€™s no evidence that it was done for nefarious reasons.

â€œPlaintiffs argue that OpenAIâ€™s failure to state which internet books it uses to train ChatGPT shows that it knowingly enabled infringement, because ChatGPT users will not know if any output is infringing.

â€œHowever, Plaintiffs do not point to any caselaw to suggest that failure to reveal such information has any bearing on whether the alleged removal of CMI in an internal database will knowingly enable infringement.â€

The authors further claimed that OpenAI distributed its works without CMI, which would also violate the DMCA. This argument fails too, the court ruled, as OpenAI didnâ€™t distribute full copies of books.

â€œInstead, [the authors] have alleged that â€˜every output from the OpenAI Language Models is an infringing derivative workâ€™ without providing any indication as to what such outputs entail â€“ i.e., whether they are the copyrighted books or copies of the books,â€ the order reads.

Direct Copyright Infringement Claim Remains

In addition to the vicarious copyright infringement and the DMCA violations, Judge MartÃnez-OlguÃn also dismissed the California Unfair Competition Law (UCL) claims for â€˜unlawful business practiceâ€™, â€˜fraudulent conductâ€™, â€˜negligenceâ€™, and â€˜unjust enrichmentâ€™. The UCL claim for â€˜unfair practicesâ€™ can proceed.

This isnâ€™t the end of the legal battle. The authors have the chance to file an amended complaint to correct any shortcomings, should they wish to proceed with the dismissed claims.

Finally, itâ€™s worth reiterating that the direct copyright infringement claim wasnâ€™t covered by OpenAIâ€™s motion to dismiss, so that will move forward as well. As will many of the other AI copyright lawsuits.

â€”

A copy of California District Judge Araceli MartÃnez-OlguÃnâ€™s order on the motion to dismiss is available here (pdf).

From: TF, for the latest news on copyright battles, piracy and more.

Bitcoin Warrior

View all posts

BitcoinWarrior

BitcoinWarrior

Court Dismisses Authorsâ€™ Copyright Infringement Claims Against OpenAI

Authors Sue, OpenAI Responds

Court Dismisses Copyright and DMCA Claims

Direct Copyright Infringement Claim Remains

Bitcoin Warrior

Can Bitcoin Reach $300K? Tether Co-Founder Predicts The ‘Sentiment-Driven Token’ Could Surge 350% Higher

European Central Bank Governor Lagarde admits to sonâ€™s cryptocurrency investment losses

Coinbase stock falls after JP Morgan downgrades to underweight â€“ Disappointing Bitcoin ETF is to blame | CoinDesk JAPAN

Coinbase and MicroStrategy stock prices soar in pre-market trading | CoinDesk JAPAN

Court Dismisses Authorsâ€™ Copyright Infringement Claims Against OpenAI

Authors Sue, OpenAI Responds

Court Dismisses Copyright and DMCA Claims

Direct Copyright Infringement Claim Remains

Bitcoin Warrior

You Might Also Like

Can Bitcoin Reach $300K? Tether Co-Founder Predicts The ‘Sentiment-Driven Token’ Could Surge 350% Higher

European Central Bank Governor Lagarde admits to sonâ€™s cryptocurrency investment losses

Coinbase stock falls after JP Morgan downgrades to underweight â€“ Disappointing Bitcoin ETF is to blame | CoinDesk JAPAN

Coinbase and MicroStrategy stock prices soar in pre-market trading | CoinDesk JAPAN