Amrusha Chati
26 October 2023 • 4 min read
AI has intrigued, engaged, amazed, and even amused people worldwide over the last year. But it has also attracted the ire of content creators, from musicians to artists. The latest bout of human creativity vs artificial intelligence saw a group of authors sue Bloomberg, Meta (formerly Facebook), and Microsoft for AI copyright infringement. The lawsuit also names EleutherAI, a non-profit AI research lab.
The group of authors includes former Arkansas governor Mike Huckabee. They have filed a lawsuit in New York federal court accusing the tech giants of using their literary works to train artificial intelligence systems without their consent.
Dicello Levitt, a firm representing the authors, told Trademarkia on behalf of all plaintiffs that:
"Too many companies have been advancing artificial intelligence systems and large language models by all means necessary—including theft of our authors' books. We're not opposed to innovation; we're opposed to the theft behind the innovation. That's why we brought this action."
This class action is the latest in a gathering storm around AI models and IP laws. But what "theft" is the lawsuit referring to? How did it happen? We need to take a closer look at what's powering the "intelligence" of AI technology for these answers.
Generative AI is an algorithm or code that can generate new content like audio, video, code, text, or images.
This forms the base of large language models (LLMs) such as ChatGPT and Bard. These LLMs are trained to imitate "human" skills such as language and creativity.
In fact, when we asked ChatGPT to describe itself, it told us this:
“As a language model, my purpose is to generate human-like responses to natural language inputs. I have been trained on a vast amount of text data and can understand and respond to a wide range of topics and questions.”
But where do these "vast amounts of data" come from?
LLMs are trained using information-laden "datasets." These are created from massive amounts of multimedia material "scraped" off the internet. They consist of millions of original, often copyrighted creative works.
The only problem is that tech companies are using it all without permission.
The arrival of ChatGPT kicked off a scramble between tech companies like OpenAI, Meta, Google, and Microsoft to make a bigger, better AI platform. And the math is simple for AI; the more you feed it, the more "intelligent" it becomes.
So, everyone wanted these datasets that served as inputs. And "The Pile" gave them exactly that.
The Pile is an 825 gigabyte, open-source language modeling dataset. It consists of 22 smaller datasets combined together.
One of these is "Books3".
According to the lawsuit, Books3 is “scraped from a large collection of approximately 183,000 pirated ebooks, most of which were published in the past 20 years.”
This has put a spotlight on major legal issues in the realm of intellectual property. It has also raised copyright concerns about the datasets used to train AI models.
Copyright law provides legal protection to all original, creative works. This protects creators against misuse and infringement of their original work. And content creators are claiming this legal right through a flurry of lawsuits.
Earlier this year, stock photo provider Getty Images filed a copyright infringement lawsuit against Stability AI Inc, accusing it of illegally using over 12 million copyrighted photos to train its popular Stable Diffusion AI image-generation system.
Just last week, on 10 October 2023, Universal Music, ABKCO, and Concord sued AI company Anthropic. The startup was allegedly misusing an "innumerable" amount of copyrighted song lyrics to train its chatbot, Claude. Anthropic is backed by tech bigwigs like Google, Amazon, and the controversial cryptocurrency billionaire Sam Bankman-Fried.
These are all firsts in their sectors, unlike writers who seem to be bringing a wave of lawsuits against AI companies.
The latest lawsuit by authors, including former Arkansas governor Mike Huckabee, against Meta, Bloomberg, Microsoft, and EleutherAI is not the first. It follows similar suits by almost a dozen famous authors, including George RR Martin, Jodi Picoult, John Grisham, Sarah Silverman, Michael Chabon, and many others.
According to the lawsuit, “Because they are a substantial source of written language, books are often used in libraries of information (or datasets) to create more sophisticated LLMs. While using books as part of datasets is not inherently problematic, using pirated (or stolen) books does not fairly compensate authors and publishers for their work.”
What sets this lawsuit apart is that it's not going after relatively newer, dedicated AI companies like OpenAI and StabilityAI. Instead, it's taking on Silicon Valley tech powerhouses. The lawsuit claims that the authors are entitled to damages. It alleges that "Microsoft, Meta, and Bloomberg chose to train their LLMs using pirated and stolen works for the purpose of making a profit.
AI technology is growing by leaps and bounds. Businesses, big or small, are rushing to create or ramp up their AI capabilities. Everyone is eager to ride this wave of technological disruption. Intellectual property law is playing catch up at this point.
However, content creators are becoming increasingly vigilant and protective of their copyrighted content.
As this is new territory for IP law, much of the policy around AI will have to adapt based on real use cases like these. The outcomes of all these cases over the next few months or years will set significant precedents for years to come.
We'll just have to wait to see if these are simply growing pains or if they will lead to a seismic shift in the AI landscape.
AUTHOR
Amrusha is a versatile professional with over 12 years of experience in journalism, broadcast news production, and media consulting. Her impressive career includes collaborating extensively with prominent global enterprises. She garnered recognition for her exceptional work in producing acclaimed shows for Bloomberg, a renowned business news network. Notably, these shows have been incorporated into the esteemed curriculum of Harvard Business School. Amrusha's expertise also encompassed a 4-year tenure as a consultant at Omidyar Network, a leading global impact investing firm. In addition, she played a pivotal role in the launch and content strategy management of the startup Live History India.
Related Blogs
Are Algorithms Intellectual Property? (+...
13 September 2024 • 4 min read
How to Incorporate a Business (+ Underst...
12 September 2024 • 9 min read
What Are the Various USPTO Trademark Cla...
12 September 2024 • 4 min read
Amazon Selling for Beginners: A Complete...
11 September 2024 • 24 min read
Are Yoga Poses Copyrighted? Let's Find O...
05 September 2024 • 3 min read