A series of potentially damaging lawsuits by authors could force ChatGPT owner OpenAI and Meta to either radically change how their AI engines ingest copyright material or eventually pay out multiple millions in compensation to authors.
ChatGPT developer OpenAI went hell for leather to be first-in-market with its sophisticated artificial intelligence chatbot by ingesting vast amounts of online material without regard for ownership.
This action aided ChatGPT to gain a first-in-market notoriety and for OpenAI to gain billions in investments, however it could ultimately come back to haunt OpenAI which increasingly is being challenged in the courts by the publishing community.
Meta’s AI engine, LLaMA is also in their sights. It is a 65-billion parameter large language model developed by Meta and released in February 2023 to compete with ChatGPT.
Authors and publishers didn’t fight back when in 2004 Google began scanning their texts in the early days of the Google Books Library Project. It set a precedent for the mass theft of people’s intellectual property online.
But almost 20 years later, they are fighting back and, while the process of addressing copyright breaches in the courts is slow, lawsuits are being filed in the US with major longer term repercussions.
US lawyers Joseph Saveri and Matthew Butterick last week announced they had had filed a class-action lawsuit against OpenAI on behalf of three more authors: Sarah Silverman, Chris Golden and Richard Kadrey. This was additional to another class-action filed late last month on behalf of authors Paul Tremblay and Mona Awad.
In their statement they branded ChatGPT and Meta’s LLaMA as “industrial-strength plagiarists that violate the rights of book authors”.
The two attorneys have also filed a lawsuit against AI image generator Stable Diffusion built on five billion digital images which they say includes copyright material.
The lawsuits are a major challenge to the courts which, if they deny copyright breaches, will be sanctioning AI engines ingesting massive amounts of personally owned copyrighted material that is published online.
“Since the release of OpenAI’s ChatGPT system in March 2023, we’ve been hearing from writers, authors, and publishers who are concerned about its uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books,” the attorneys said in a joint statement.
The lawsuits focus on the use of author’s text as training material which AI systems ingest to gain their skills at understanding and producing language.
“When ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works,” the claim says.
“Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation,” the pair say in their statement.
“Books in particular are recognized within the AI community as valuable training data. A team of researchers from MIT and Cornell recently studied the value of various kinds of textual material for machine learning. Books were placed in the top tier of training data that had ‘the strongest positive effects on downstream performance’.
“Books are also comparatively “much more abundant” than other sources, and contain the “longest, most readable” material with “meaningful, well-edited sentences”.
In the litigation statement, the plaintiff authors are listed as highly successful and high profile figures. Sarah Silverman is a two-time Emmy Award-winning comedian, actress and author of a bestselling memoir “The Bedwetter: Stories of Courage, Redemption, and Pee”.
Christopher Golden is the New York Times bestselling, Bram Stoker Award-winning author of Road of Bones, Ararat, Snowblind, and Red Hands while Richard Kadrey is the New York Times bestselling author of the Sandman Slim supernatural noir series.
They join Paul Tremblay whose many book include crime novels The Little Sleep and No Sleep Till Wonderland, and Mona Awad, the author of Bunny, named a Best Book of 2019 by Time, Vogue, and the New York Public Library.
The lawsuits have been filed in the US District Court in San Francisco with the authors seeking damages and a civil trial by jury.
There are further implications for the AI models from these trials. Could OpenAI and Meta be forced to recast their learning algorithms to exclude copyrighted materials? That would be a huge deal.
Published by Channel News Australia.