How New York Times lawsuit against OpenAI & Microsoft adds to AI regulation debate

By Ann Roberts On Dec 30, 2023

New Delhi: The New York Times (NYT) Wednesday sued ChatGPT creator OpenAI and Microsoft — an investor that owns 49 percent of the former — over “unlawful” use of content protected by US copyright law, a first for a news publication.

In the lawsuit filed in a federal court in Manhattan, NYT has argued that, due to its content being used in training the generative AI tools, the latter can “can generate output that recites Times content verbatim, closely summarises it and mimics its expressive style…These tools also wrongly attribute false information to The Times.”

The case could test the legal boundaries of AI in the US and set important precedents for the wider world, at a time when the technology’s explosive growth is far outpacing most regulatory regimes, including that of India.

ThePrint looks at the lawsuit and other cases against OpenAI, as well as the wider regulatory lacunae this battle has drawn attention to.

What are the charges?

Seeking large damages without citing a specific amount, the New York Times lawsuit says OpenAI was “built in large part on the unlicensed exploitation of copyrighted works belonging to The Times and others.” It adds that the company is now valued as high as $90 billion, with a projected revenue of over $1 billion in 2024.

The Times, on the other hand, has been “deprived of subscription, licensing, advertising and affiliate revenue” worth “billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works”.

The lawsuit comes after negotiations since April, the publishers claim. These were inconclusive, NYT says, due to the defendants insisting that their actions be protected as “fair use” because the “use of copyrighted content to train GenAI models serves a new ‘transformative’ purpose”.

The lawsuit contests this defence, saying there is “there is nothing transformative about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it. Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.”

As examples, the lawsuit shows samples of text that were allegedly extracted directly from an NYT article without providing a link to the story or the referral links that bring in money, contrasting this with the established practices of search engines.

Also read: Global policymakers don’t understand AI enough to regulate it. Tech companies must step up now

‘A vacuum that no computer or AI can fill’

The lawsuit argues that OpenAI’s practices undermine investment in journalism, including future AI licensing.

Highlighting 100 examples of alleged copyright infringement with examples of copied content, the lawsuit cites the human effort and financial investment that goes into the paper’s “copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more”. Copying content from these “threatens The Times’s ability to provide that service”.

“If The Times and other news organisations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill,” the lawsuit says.

It further calls OpenAI “a business model based on mass copyright infringement” adding that “all of the OpenAI Defendants have been either directly involved in or have directed, controlled, and profited from OpenAI’s widespread infringement and commercial exploitation of Times works, through a series of holding and shell companies which were directly involved in the design, development, and commercialisation of OpenAI’s GPT-based products, and directly engaged in the widespread reproduction, distribution, and commercial use of Times works. ‘

The lawsuit draws attention to the possible harm that “hallucination”— a phenomenon where chatbots provide incorrect information that is then incorrectly attributed to a source — may do to The Times‘s reputation.

It also highlights the importance of NYT articles as sources for ChatGPT, accounting for more than 1 percent of sources listed in OpenWebTxt2, a dataset used in training ChatGPT’s third version.

“While Defendants engaged in wide scale copying from many sources, they gave Times content particular emphasis when building their LLM’’, the lawsuit says.

It adds that NYT had blocked OpenAI’s web crawler in August to prevent it from using its stories to train its AI models. CNN and Reuters had also done the same.

Multiple lawsuits against OpenAI

This follows a month after OpenAI co-founder and CEO Sam Altman was fired — over conflict in the organisation about whether to stick to the safety-first values upon which it was built, or to pursue a less restrictive growth strategy — before being rehired after OpenAI employees threatened to resign.

Apart from internal problems, the company is also dealing with several lawsuits that were brought in 2023.

Author and editor Julian Sancton had filed a lawsuit against OpenAI, claiming it had used thousands of non-fiction books, including his own work, Madhouse at the End of the Earth: The Belgica’s Journey into the Dark Antarctic Night, without permission, for its large language model (LLM) enhancements. Sancton was the first to implicate Microsoft as co-defendant, even before NYT.

In September, US authors John Grisham, George R.R. Martin, Jodi Picoult and 17 others had filed a similar copyright infringement lawsuit against OpenAI for “systematic theft on a mass scale” and using copyrighted material without permission.

That came after comedian Sarah Silverman filed a lawsuit in July, and that same month, writers Margaret Atwood and Philip Pullman signed an open letter requesting that AI companies pay them for using their creations.

A group of IT professionals has also filed a lawsuit against OpenAI, Microsoft, and the programming website GitHub. They claim that their code was used without their consent to train a Microsoft owned AI tool, Copilot.

Along with these legal actions, text-to-image generators Stability AI and Midjourney were sued by artists in January on the grounds that they can only produce images if they are trained on copyrighted artwork.

Getty Images has filed lawsuits in the UK and the US against Stability AI for allegedly reproducing several of its images and metadata without permission.

These lawsuits are still pending in court, and in November this year, after being reinstated, Altman made an announcement, promising to pay the legal fees of clients of OpenAI if they are sued for copyright infringement.

Also read: Global framework required to leverage the opportunities of AI, says MeitY official at tech summit

Regulatory lacunae

In the absence of comprehensive legislation specifically addressing AI concerns, countries worldwide, including India, are grappling with the challenge of regulating AI applications within the confines of existing laws.

The European Union’s proposed AI law— the world’s first comprehensive AI legislation — is expected to be enforced by 2025. This law categorises AI applications into four risk-based classifications: unacceptable risk, high risk, moderate risk, and low risk, an approach that prioritises a legislative framework to govern AI risks.

India currently relies on adapting existing laws to cover AI applications until the enactment of specific AI legislation. The proposed Digital India Act, intended to replace the Information Technology (IT) Act of 2000, is anticipated to encompass AI regulations, but is yet to be introduced.

According to the Global AI Regulation Tracker by the International Association of Privacy Professionals (IAPP), none of the top 15 world economies had enacted comprehensive AI laws.

While China has implemented sector-specific guidelines, particularly in finance and healthcare, the United States follows a decentralised model, allowing individual states to propose AI legislation.

Cyber law expert Pavan Duggal says, “With the rapid growth of technologies, AI regulation requires a dedicated law rather than a passing reference in some act.”

The NYT vs OpenAI legal battle also brings copyright law into sharp focus.

India’s Copyright Act includes creators of computer-generated works in its definition of “author” but does not take into account the fact that AI systems depend on copyrighted content created by other authors for their training datasets.

Duggal, who is also the CEO of AI LawHub — a Delhi-based initiative “created to track, study, examine and facilitate, various aspects of Artificial Intelligence Law” — adds that such complaints do hold. “It is only a question of time before such lawsuits (like the NYT case) are filed in India. Until then, companies (such as OpenAI) can be sued under the Copyright Act, if such a situation arises”.

Also read: India’s internet regulation laws need judicial oversight. Govt officials can’t match judges