Pending Lawsuits against Artificial Intelligence

Computer software can be used to mimic human creativity in fields like art, coding, writing, and music. Because there is not yet an established legal framework on how to treat these technologies and because creators are concerned, justifiably so, that these developments are about to disrupt their industries, lawsuits are being filed.

A Trio of Lawsuits Challenging Artificial Intelligence

Court battles over AI training methods will help shape the legal landscape for this new technology.

Court battles over AI training methods will help shape the legal landscape for this new technology.

As of early 2023, it appears that there are three major cases. Each of them focuses on the idea that the large databases used to train artificial intelligence contain copyrighted, licensed, or trademarked work, and so it is allegedly improper to use those databases to train artificial intelligence to create new works. The three cases are interesting not only because the decisions reached in them will affect how the industry evolves, but also because the attorneys in each case are taking slightly different philosophical positions.

It may seem odd to think about philosophy in the context of litigation, but in this field in particular there is room for these kinds of questions. Copyright law is a creation of statute, but what constitutes a “fair use” of copyrighted material under federal law is entrusted largely to the common law. That means that individual judges have a lot of room to make decisions they think implement the legislative intent and social purposes of copyright law. Accordingly, I expect these cases to be very interesting and ask a lot of large-scale questions about how we want our society organized. I also expect that the outcomes will be harder than normal to predict. That is not to say that there is not a lot of guiding case law. There is, with cases stretching back decades involving such popular companies as Sega and Google. But, with each new change in technology, it is possible for new law to develop. I believe we may see that here.

Anderson v. Stability AI Ltd.

The first case to watch is Anderson. v. Stability AI Ltd. This lawsuit is a putative class action headlined by a popular online cartoonist against image generating artificial intelligence. Like the other lawsuits, the plaintiff’s major argument is that her work, and other artists’ work, were used to train the software and this was not a fair use.

The plaintiff here argues that training an artificial intelligence is basically just a new kind of compression algorithm and when images are generated it is just a collage of the compressed works.This argument may be technically suspect. A compression system, like using Windows to “zip” a file, allows the original to be reconstituted perfectly or within certain loss tolerances. Image generating software does not. And, from a purely mathematical standpoint, it is probably not possible to compress the many terabytes of training data into the much smaller models that have been trained on the same.

The plaintiff tries to address this by comparing any differences between the training set and the produced works to compression loss, but it is unclear to me how much success he will have with this analogy. It may be that there is a meaningful difference between loss due to compression, like artifacts in your on-line video, and the kind of “loss” involved in artificial intelligence, where the software is not attempting to record the data from the original image but instead the relationships between text prompts and certain patterns. One of these is simply loss of information, and the other seems to involve not collecting certain data in the first place.

Another objection to the compression analogy is that it is unclear where the line would be drawn between what is compression and what is not. What about a human being who studies the art style of another in order to create new works in the same style? The plaintiff in Anderson makes a nod to this argument, pointing out that it is rare for any individual human artist to put the time and talent forth to do this, even if theoretically possible, and no human could ever hope to do this for every other artist all at once. In other words, the sheer scope and speed at which artificially intelligence makes it of a different character from a human being even if they are doing the same task. This is an interesting point and I wonder if the law will eventually start engaging line-drawing between when artificial intelligence is “too good” at doing something and requires different treatment in the law compared to a human doing the same task.

Doe v. Github

In Doe v. Github, individuals who contributed code and other information to an online coding repository, Github, are suing that company and others who used the code as training data for a large language model that helps programmers write their own code. Instead of artwork, that is, this lawsuit concerns artificial intelligence that generates computer code. Like Anderson, the plaintiffs in Github hasten to point out that even though a human being could theoretically read and learn from Github and then write new code based on what was learned, there is no way a human could “brute force” their way through the massive amount of data online and discover the kinds of patterns between words in the way a large language model can.

While the Github lawyers do not argue that the artificial intelligence is merely a compression algorithm, they make a similar philosophical argument by pointing out that the artificial intelligence does not “understand” the code it is writing in the same way a human would. Rather, it is simply reproducing or copying patterns that it saw in the training data. The other side, in response, has argued that less than 1% of the output from the artificial intelligence is allegedly the same as any of the input, and so it is not fair to say that the system is merely copying other work.

The Github plaintiffs also make a point about the licenses involved in accessing publicly ability information for training purposes. The contributors to Github often provided their contributions  under specific kinds of open source licenses which allow others to use the code but only subject to certain conditions. In particular, the plaintiffs point out, much of that code can be used but only with attribution. Since the artificial intelligence system is not producing the attributions of the authors it trained on when it produces work product, the license may be violated, or so the argument goes.

Getty Images v. Stability AI

Whereas Anderson involves artists suing generative art, Getty Images v. Stability AI involves a stock photo company doing the same. In this case, the plaintiff operates a service that collects photographs, organizes them, and writes descriptive caption. When someone wants a photograph for some purpose, such a stock image, they type a search query and are matched by the photos that might do the job. The user experience with most artificial intelligence art programs is very similar – type in a prompt, get a bunch of possible images. This is interesting because while both cases involve arguments that the artificial intelligence threatens to replace the plaintiffs in the market, in the case of Getty Images the argument seems much more direct because the business models and user interface seem extremely similar.

The plaintiff in Getty Images also points out that artificial intelligence trained on its photographs will, not infrequently, reproduce its own trademarked watermark or something similar. This occurs because the arrangement of pixels making up the watermark are seen as just another attribute of the image to be learned, so the artificial intelligence might learn to associate it with words like “photo” or “professional” or whatever else happens to often occur in the text description on training photos. According to the plaintiff, this gives rise to independent claims for trademark infringement on top of the copyright claims.

Another point made more forcefully by Getty Images than the other plaintiffs, at least so far, is that the process of training the artificial intelligence may involve creating “intermediate copies” of the copyrighted work. That is, even if the final output from the artificial intelligence program is not itself an illegal copy, along the way of training the model there were direct copies made for training purposes. Getty Images suggests that this intermediate act may be where the copyright infringement occurred.

Pay Attention While the Law Catches Up

Anyone interested in using artificial intelligence to generate work product in their business should be paying attention to these cases. The decisions they reach will probably be the first to set the ground rules and understanding for how artificial intelligence can be trained and who is liable for misuse. I am seeing many new businesses rush to implement artificial intelligence now, before the cases are resolved. I understand that this is tempting and may, in the short term, actually be a good business decision. But, rushing into this area before courts have laid down the rules comes with a lot of risk.

At a minimum, businesses should be aware of the current unsettled nature of the law surrounding how these models are trained, which is the common issue in the three major pending cases. Knowing how your artificial intelligence’s model was trained, and by whom, is helpful. Software contracts often contain indemnities against copyright claims by third parties, and these clauses probably should be updated and modified to account for the kind of arguments being leveled against artificial intelligence. Insurance policies, too, may need to be updated to ensure that the risks involved in this new area of business are accommodated. Above all, businesses should proceed with caution as they start to leverage the power of these new systems.

These cases are the tip of the iceberg on issues related to artificial intelligence generated work. I suspect that we will see legislation, regulation, or court decisions addressing other topics such as forgery and fraud committed using artificial intelligence to impersonate artistic or literary styles, identity theft using artificial intelligence, cheating scandals involving education, application processes, and test-taking, employment disputes over the use of artificial intelligence to generate work in the workplace, if, how, and when artificial intelligence can fit into the creation of copyrighted works, and disputes over the unauthorized practice of law, medicine, or similar regulated professions, just to name a few.

These risks should not discourage you from exploring the benefits of this new technology in its current form. Disruptive technology has the promise of big changes to the market and thus big opportunities for those who can harness it. However, doing so without adequate legal protection is not a great idea.