The US Copyright Office is accepting public comments on prospective new laws governing the use of copyrighted works by generative AI, and the world’s largest AI businesses had lots to say. We’ve compiled responses from Meta, Google, Microsoft, Adobe, Hugging Face, StabilityAI, and Anthropic, as well as an Apple statement centred on copyrighting AI-written code, below.
Their tactics differ slightly, but the main message is consistent: they do not believe they should be required to pay to train AI models on protected content.
The Copyright Office opened the comment period on August 30th, with a deadline of October 18th for written comments on proposed changes to the use of copyrighted data for AI model training, whether AI-generated material can be copyrighted without human involvement, and AI copyright liability. In the recent year, there has been no shortage of copyright lawsuits, with artists, writers, developers, and corporations all asserting breaches in various circumstances.
The following are excerpts from each company’s answer.
Imposing a first-of-its-kind licensing regime now, after the fact, will cause chaos as developers attempt to identify millions and millions of rightsholders for very little benefit, given that any fair royalty due would be incredibly small in light of the insignificance of any single work among an Al training set.
There would be no copyright issues if training could be completed without the generation of duplicates. Indeed, the act of “knowledge harvesting,” to use the Court’s metaphor from Harper & Row, like reading a book and acquiring the facts and ideas contained within it, would not only be non-infringing, but would also serve the objective of copyright law. The fact that copies are required as a technological matter to extract such ideas and facts from copyrighted works should not change that finding.
Any necessity to get permission for accessible works to be used for training would put a damper on Al innovation. Even when the name of a work and its owner are known, obtaining the volume of data required to construct responsible Al models is not practicable. Such licencing schemes will also stifle innovation from start-ups and new entrants who lack the resources to obtain licences, leaving Al development to a small group of companies with the resources to run large-scale licencing programmes or to developers in countries where it has been determined that using copyrighted works to train Al models is not infringement.
We believe that existing law and continued collaboration among all stakeholders can harmonise the diverse interests at stake, unlocking the benefits of AI while addressing concerns.
The Ninth Circuit ruled in Sega v. Accolade that intermediate copying of Sega’s software was fair use. While reverse engineering to discover the functional requirements—unprotected information—for making games compatible with Sega’s gaming console, the defendant created copies. This intermediary copying helped the audience as well, since it increased the amount of independently produced video games (with a mix of utilitarian and artistic characteristics) accessible for Sega’s platform. The Copyright Act was designed to encourage this expansion of creative expression.
As previously explained, the training process for Claude creates copies of information in order to do statistical analysis on the data. Copying is just a stage in the process of removing unprotectable parts from the complete corpus of works in order to produce new outputs. In this sense, the use of the original copyrighted work is non-expressive; that is, the copyrighted expression is not re-used to communicate with people.
Over the last decade or so, tremendous amounts of money have been invested in the development of AI systems, with the knowledge that under present copyright law, any copying necessary to extract statistical data is permissible. A shift in this regime will drastically upend established assumptions in this sector. These expectations have been a major component in the massive infusion of private capital into U.S.-based AI startups, which has resulted in the United States being a global leader in AI. Undermining those expectations will endanger future investment, as well as the United States’ economic competitiveness and national security.
The use of a specific work in training serves a larger purpose: the development of a distinct and productive Al model. Rather of replicating the original work’s precise communicative expression, the model is capable of producing a wide range of outputs that are completely unrelated to the underlying, copyrightable expression. For these and other reasons, when trained on a large number of copyrighted works, generative Al models are often considered fair use. However, we say “generally” on purpose since one may envisage patterns of facts that would create difficult judgements.
Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia, and Israel have also amended their copyright laws to provide safe harbours for Al training that achieve the same aims as fair use.” In the United Kingdom, the Government Chief Scientific Advisor has advised that “if the government’s goal is to promote an innovative Al industry in the UK, it should enable mining of available data, text, and images (the input) and use [sic] existing copyright and IP law on the output of AI.”
When a human developer is in charge of the expressive components of output as well as the decisions to alter, add to, enhance, or even reject proposed code, the final code that arises from the developer’s interactions with the tools will contain enough human authorship to be copyrightable.