The New York Times (NYT) is taking on tech giant Microsoft (MS) and The Disrupters – the OpenAI group of companies (OAI) – in the New York District Court, alleging that MS and OAI are copying and using NYT’s work and “massive investment in journalism” without permission or payment to create generative AI (GenAI) tools and products, like Microsoft’s Co-Pilot (formerly Bing Chat) and OpenAI’s ChatGPT, that compete with it.  This, according to NYT, translates into vast savings and profits for MS and OAI.

The stakes are high. There are hundreds of stories, books, and movies in the “computer takes over the world” genre.  In 1909, E.M Forster’s book “The Machine Stops” was published.  In 2019, Oliver Sachs, sensitive to the difference between knowledge and information, writing for the New Yorker, explains how in “The Machine Stops”, EM Forster imagined, “a future where people live underground in isolated cells, never seeing one another and communicating only by audio and visual devices.  In this world, original thought and direct observation are discouraged – ‘Beware of first-hand ideas!’ people are told.”

NYT and MS need no introduction.  According to NYT, OAI is “a commercial enterprise valued as high as $90 billion dollars, with revenues projected to be over $1 billion in 2024“. Its founders, state NYT, are “some of the wealthiest technology entrepreneurs and investors and companies like Amazon WebServices and InfoSys. This group included Elon Musk, the CEO of Tesla and X Corp. (formerly known as Twitter); Reid Hoffman, the co-founder of LinkedIn; Sam Altman, the former president of Y Combinator; and Greg Brockman, the former Chief Technology Officer of Stripe“.

NYT’s claims against MS and OAI are for copyright infringement, vicarious copyright infringement, contributory copyright infringement; the removal of copyright management information (in contravention of the Digital Millenium Copyright Act), unfair competition by misappropriation and trade mark dilution.

The New York Times is seeking statutory damages; compensatory damages; restitution and disgorgement of profits, which NYT alleges are massive:

“Microsoft’s deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone and OpenAI’s release of ChatGPT has driven its valuation to as high as $90 billion.

Each Defendant has reaped substantial savings by taking and using—at no cost— New York Times content to create their LLMs. Times journalism is the work of thousands of journalists, whose employment costs hundreds of millions of dollars per year. Each Defendant has wrongfully benefited from nearly a century of that work—some performed in harm’s way—that remains protected by copyright law. Defendants have effectively avoided spending the billions of dollars that The Times invested in creating that work by taking it without permission or compensation.”

In addition, NYT is seeking any other appropriate relief permitted in law or equity; for the court to permanently enjoin (interdict) MS and OAI from their alleged unlawful, unfair, and infringing conduct described in NYT’s complaint; and – Stop the Machines – for the destruction of all GTP or other large language models (LLMs) and training sets that incorporate NYT’s copyright work.

Copyright laws protect original works of expression, including literary works in the form of articles and other journalistic content.  NYT as the owner of the copyright in this content has the exclusive right to do certain acts in relation to this content, including to make a reproduction or adaptation of it.  According to NYT, GenAI tools rely on LLMs that contain unauthorised copies of NYT content and these tools “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples“.  GenAI tools provide infringing content on a massive scale, says NYT, and at times, misinformation.

In what way are copies of NYT’s copyright works alleged to have been made by MS and OAI?

In setting out its claim for copyright infringement, The New York Times explains it in this way:

  • GenAI tools rely on LLMs “that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more”. ChatGPT was trained on several datasets. Common Crawl was the highest weighted dataset which was used to train ChatGPT and of that dataset NYT is the biggest proprietary source. Of all the datasets taken together, NYT is the third biggest proprietary source, following Wikipedia and a database of U.S. patent documents. Unauthorised copies of NYT content are also made, NYT explains, when MS generates search results in the process of crawling the web to create search results for Bing Chat and Browse with Bing products;
  • copies of copyrighted works are encoded into the parameters of the LLMs; and
  • copies of copyrighted works are displayed in GPT product outputs by showing copies or derivatives retrieved from the LLMs.

NYT explains that copyright protection underpins its business model: were it not able to control and monetise its copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides etc, by paywalls and commercial licensing, it would have less revenue, fewer journalists and a chilling effect on storytelling, at great cost to society.

But, surely, defendants of this calibre and level of sophistication would not have gone ahead with their innovation without getting “the necessary clearances” for “third party content”?  NYT is no stranger to licensing its content, and has its own clearinghouse that licences content to corporate and academic users.

NYT claims that when it discovered that MS and OAI were using NYT content without permission, NYT tried to reach an agreement with MS and OAI on “fair value for the use of its content” and also to “facilitate the continuation of a healthy news ecosystem” and “help develop GenAI technology in a responsible way that benefits society and supports a well-informed public”.  No resolution was reached.

The New York Times anticipates MS and OAI raising a fair use defence.  In the United States, the judicial doctrine of fair use is a fundamental and well established limitation on the exclusive rights of copyright owners. The United States Copyright Act sets out a non-exhaustive list of factors to be considered in order to determine whether the use made of a protected work falls within the scope of this defence, including the purpose and character of use.  In considering this factor, an assessment is made as to whether the new work is transformative in the sense that it adds something new, with a further purpose or different character, or whether it merely supersedes the objects of the original creation.

The NYT makes the following submission:

“Defendants insist that their conduct is protected as “fair use” because their unlicensed use of copyrighted content to train GenAI models serves a new “transformative” purpose. But there is nothing “transformative” about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it. Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use”.

NYT also claims that MS and OAI’s conduct constitutes unlawful competition.  According to the New York Times, the GenAI tools generate “hallucinations” or misinformation and falsely attributes these to NYT and in this way, deprives NYT of affiliate referral revenue.

NYT content for its publication “Wirecutter” contains links to NYT’s merchant partners who are obliged to give NYT a portion of the sale price on completion of a sale initiated by clicking on a prominent hyperlink.  Synthetic search results generated by GenAI tools display NYT content but omit the links to NYT’s merchant partners and in this way, deprive NYT of affiliate referral revenue (i.e. commission) from its merchant partners.  These search results also falsely attribute recommendations to NYT, create perceptions of unreliable recommendations and in this way erode consumer trust and damage the Wirecutter brand and NYT’s reputation.

It’s worth watching this case.