Author lawsuit is danger for OpenAI, Meta

A series of potentially damaging lawsuits by authors could force ChatGPT owner OpenAI and Meta to either radically change how their AI engines ingest copyright material or eventually pay out multiple millions in compensation to authors.

ChatGPT developer OpenAI went hell for leather to be first-in-market with its sophisticated artificial intelligence chatbot by ingesting vast amounts of online material without regard for ownership.

This action aided ChatGPT to gain a first-in-market notoriety and for OpenAI to gain billions in investments, however it could ultimately come back to haunt OpenAI which increasingly is being challenged in the courts by the publishing community.

Meta’s AI engine, LLaMA is also in their sights. It is a 65-billion parameter large language model developed by Meta and released in February 2023 to compete with ChatGPT.

Comedian Sarah Silverman is among high profile authors taking on OpenAI for copyright breaches.

Authors and publishers didn’t fight back when in 2004 Google began scanning their texts in the early days of the Google Books Library Project. It set a precedent for the mass theft of people’s intellectual property online.

But almost 20 years later, they are fighting back and, while the process of addressing copyright breaches in the courts is slow, lawsuits are being filed in the US with major longer term repercussions.

US lawyers Joseph Saveri and Matthew Butterick last week announced they had had filed a class-action lawsuit against OpenAI on behalf of three more authors: Sarah Silverman, Chris Golden and Richard Kadrey. This was additional to another class-action filed late last month on behalf of authors Paul Tremblay and Mona Awad.

In their statement they branded ChatGPT and Meta’s LLaMA as “industrial-strength plagiarists that violate the rights of book authors”.

Award-winning novelist Christopher Golden is also suing for AI-related copyright breaches.

The two attorneys have also filed a lawsuit against AI image generator Stable Diffusion built on five billion digital images which they say includes copyright material.

The lawsuits are a major challenge to the courts which, if they deny copyright breaches, will be sanctioning AI engines ingesting massive amounts of personally owned copyrighted material that is published online.

“Since the release of OpenAI’s Chat­GPT sys­tem in March 2023, we’ve been hear­ing from writ­ers, authors, and pub­lish­ers who are con­cerned about its uncanny abil­ity to gen­er­ate text sim­i­lar to that found in copy­righted tex­tual mate­ri­als, includ­ing thou­sands of books,” the attorneys said in a joint statement.

The lawsuits focus on the use of author’s text as training material which AI systems ingest to gain their skills at understanding and producing language.

Meta is also being targeted for AI-related copyright breaches related to its AI language model LLaMA.

“When ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” the claim says.

“Much of the mate­r­ial in the train­ing datasets used by OpenAI and Meta comes from copy­righted works—includ­ing books writ­ten by Plain­tiffs—that were copied by OpenAI and Meta with­out con­sent, with­out credit, and with­out com­pen­sa­tion,” the pair say in their statement.

“Books in par­tic­u­lar are rec­og­nized within the AI com­mu­nity as valu­able train­ing data. A team of researchers from MIT and Cor­nell recently stud­ied the value of var­i­ous kinds of tex­tual mate­r­ial for machine learn­ing. Books were placed in the top tier of train­ing data that had ‘the strongest pos­i­tive effects on down­stream per­for­mance’.

“Books are also com­par­a­tively “much more abun­dant” than other sources, and con­tain the “longest, most read­able” mate­r­ial with “mean­ing­ful, well-edited sen­tences”.

In the litigation statement, the plaintiff authors are listed as highly successful and high profile figures. Sarah Silverman is a two-time Emmy Award-winning comedian, actress and author of a best­selling mem­oir “The Bed­wet­ter: Sto­ries of Courage, Redemp­tion, and Pee”.

FILE PHOTO: A 3D printed Facebook’s new rebrand logo Meta is placed on laptop keyboard in this illustration taken on November 2, 2021. REUTERS/Dado Ruvic/Illustration/File Photo

Christo­pher Golden is the New York Times best­selling, Bram Stoker Award-win­ning author of Road of Bones, Ararat, Snow­blind, and Red Hands while Richard Kadrey is the New York Times best­selling author of the Sand­man Slim super­nat­ural noir series.

They join Paul Tremblay whose many book include crime novels The Little Sleep and No Sleep Till Wonderland, and Mona Awad, the author of Bunny, named a Best Book of 2019 by Time, Vogue, and the New York Pub­lic Library.

The lawsuits have been filed in the US District Court in San Francisco with the authors seeking damages and a civil trial by jury.

There are further implications for the AI models from these trials. Could OpenAI and Meta be forced to recast their learning algorithms to exclude copyrighted materials? That would be a huge deal.

Published by Channel News Australia.

Posted in Features and tagged , , , , , .

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.