AI

OpenAI claims New York Times copyright lawsuit is without merit

Kommentar

pattern of openAI logo
Image Credits: Bryce Durbin / TechCrunch

In late December, The New York Times sued OpenAI and its close collaborator and investor, Microsoft, for allegedly violating copyright law by training generative AI models on the Times’ content. Today, OpenAI gave a public response, claiming — unsurprisingly — that the Times’ lawsuit is meritless.

In a letter published this afternoon on OpenAI’s official blog, the company reiterates its view that training AI models using publicly available data from the web — including articles like the Times’ — is fair use. In other words, in creating generative AI systems like GPT-4 and DALL-E 3, which “learn” from billions of examples of artwork, ebooks, essays and more to generate human-like text and images, OpenAI believes that it isn’t required to license or otherwise pay for the examples — even if it makes money from those models.

“We view this principle as fair to creators, necessary for innovators and critical for U.S. competitiveness,” OpenAI writes.

OpenAI also addresses in its letter regurgitation, the phenomenon where generative AI models spit out training data verbatim (or near-verbatim) when prompted in a certain way — for example, generating a photo that’s identical to one taken by a famous photographer. OpenAI makes the case that regurgitation is less likely to occur with training data from a single source (e.g., The New York Times) and places the onus on users to “act responsibly” and avoid intentionally prompting its models to regurgitate.

“Interestingly, the regurgitations The New York Times [cites in its lawsuit] appear to be from years-old articles that have proliferated on multiple third-party websites,” OpenAI writes. “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

OpenAI’s response comes as the copyright debate around generative AI reaches a fever pitch.

In a piece published this week in IEEE Spectrum, noted AI critic Gary Marcus and Reid Southen, a visual effects artist, show how AI systems, including DALL-E 3, regurgitate data even when not specifically prompted to do so — making OpenAI’s claims to the contrary less credible. Marcus and Southen, in fact, make reference to The New York Times lawsuit in their piece, noting that the Times was able to elicit “plagiaristic” responses from OpenAI’s models simply by giving the first few words from a Times story.

The Times is only the latest copyright holder to sue OpenAI over what it believes is a clear violation of IP laws.

Actress Sarah Silverman joined a pair of lawsuits in July that accuse Meta and OpenAI of having “ingested” Silverman’s memoir to train their AI models. In a separate suit, thousands of novelists, including Jonathan Franzen and John Grisham, claim OpenAI sourced their work as training data without their permission or knowledge. And several programmers have an ongoing case against Microsoft, OpenAI and GitHub over Copilot, an AI-powered code-generating tool, which the plaintiffs say was developed using their IP-protected code.

Some news outlets, rather than fight generative AI vendors in court, have chosen to ink licensing agreements with them. The Associated Press struck a deal in July with OpenAI, and Axel Springer, the German publisher that owns Politico and Business Insider, did likewise in December. OpenAI also has deals in place with the American Journalism Project and NYU.

But the payouts tend to be quite small. According to The Information, OpenAI — whose annualized revenue reportedly hovers around $1.6 billion — offers between $1 million and $5 million a year to license copyrighted news articles to train its AI models.

Until recently, The New York Times, too, had been in conversations with OpenAI to establish a “high-value” partnership involving “real-time display” of its brand in ChatGPT, OpenAI’s AI-powered chatbot. But discussions broke down in mid-December, according to OpenAI.

For what it’s worth, the public might be on publishers’ sides. According to a recent poll from the independent think tank The AI Policy Institute, when informed about the details of The New York Times lawsuit against OpenAI, 59% of respondents agreed that AI companies shouldn’t be allowed to use publisher content to train models while 70% said that the companies should compensate outlets if they want to use copyrighted materials in model training.

More TechCrunch

One-click checkout tech company Bolt is still waiting to find out if shareholders will sign off on a proposed funding round with stipulations that founder Ryan Breslow would return as CEO. In…

One of Bolt’s proposed new backers, The London Fund, has been scrubbing its web page

Whatever size the tranche ends up being it’ll be OpenAI’s biggest outside infusion of capital since January 2023.

OpenAI reportedly in talks to close a new funding round at $100B+ valuation

Reddit’s mobile and web applications went down on Wednesday afternoon, with more than 150,000 users reporting outages on Downdetector as of 1:30 p.m. in San Francisco. When trying to access…

Reddit back online after a software update took it down

For months, a tech forum ran wild asking if the Converge 2 accelerator program actually happened. We finally found out.

OpenAI’s Converge 2 program has been shrouded in mystery

Bluesky on Wednesday introduced the ability to hide replies, as well as a way to detach your original post from someone’s quote post.

Bluesky adds ‘anti-toxicity’ tools and aims to integrate ‘a Community Notes-like’ feature in the future

Featured Article

Fluid Truck’s board ousted its sibling co-founders amid allegations of mismanaging funds

Fluid Truck, a startup that was founded to disrupt the commercial vehicle rental industry, has ousted its sibling co-founders — CEO James Eberhard and chief legal counsel Jenifer Snyder — according to sources familiar with the matter. The shakeup, which employees have described as a hostile takeover, was led by…

Fluid Truck’s board ousted its sibling co-founders amid allegations of mismanaging funds

Meta announced Wednesday that users on Threads will be able to see fediverse replies on other posts besides their own.

Threads deepens its ties to the open social web, aka the ‘fediverse’

Just weeks ago, during an interview with TechCrunch, Thomas Ingenlath laid out his plan to turn Polestar into a self-sustaining company. Now, he’s out.  Polestar said Tuesday Ingenlath has resigned as…

Polestar is getting a new CEO amid EV sales slump

Midjourney, the AI image-generating platform that’s reportedly raking in more than $200 million in revenue without any VC investment, is getting into hardware. The company made the announcement in a…

Midjourney says it’s ‘getting into hardware’

Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, sign up here. Say what you will about generative AI. But it’s commoditizing…

This Week in AI: AI is rapidly being commoditized

OpenSea, which calls itself the “world’s largest” nonfungible token (NFT) marketplace, received a Wells notice from the SEC, the company said in a blog post Wednesday, indicating the regulator may…

SEC takes aim at NFT marketplace OpenSea

Kissner previously served as Twitter’s chief information security officer, and held senior security and privacy positions at Apple, Google, and Lacework.

Ex-Twitter CISO Lea Kissner appointed as LinkedIn security chief

Featured Article

A comprehensive list of 2024 tech layoffs

A complete list of all the known layoffs in tech, from Big Tech to startups, broken down by month throughout 2024.

A comprehensive list of 2024 tech layoffs

It’s been more than a year since Tesla agreed to open its Supercharger network to electric vehicles from other automakers, like General Motors and Ford. But Tesla’s network of nearly…

Tesla’s Supercharger network is still unavailable to non-Tesla EVs

Tumblr is making the move to WordPress. After its 2019 acquisition by WordPress.com parent company Automattic in a $3 million fire sale, the new owner has focused on improving Tumblr’s…

Tumblr to move its half a billion blogs to WordPress

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show an anachronistic…

Google says it’s fixed Gemini’s people-generating feature

Exclusive: Millennium Space Systems will soon have a new CEO as Jason Kim has departed the company, TechCrunch has learned. 

The CEO of Boeing’s satellite maker, Millennium Space, has quietly left the company

As of the company’s most recent financial quarter, Apple’s Services bsuiness represented about one-quarter of the tech giant’s revenue.

Apple reportedly cuts 100 jobs working on Books and other services

After a long week of coding, you might assume San Francisco’s builders would retreat into the Bay Area’s mountains, beaches or vibrant clubbing scene. But in reality, when the week…

Born from San Francisco’s AI hackathons, Agency lets you see what your AI agents do

You’ve got the product — now how do you find customers? And once you find those customers, how do you keep them coming back for more? At TechCrunch Disrupt 2024,…

VCs and founders talk finding (and keeping) product-market fit at TechCrunch Disrupt 2024

Snapchat announced on Wednesday that it’s releasing new resources for educators to help them create safe environments in their schools by better understanding how their students use the app. The…

Snapchat releases new teen safety resources for educators

Marty Kausas, Pylon’s CEO and co-founder, says they quickly learned that the omnichannel approach the company originally took was just a first step, and customers were clamoring for more.

Pylon lands $17M investment to build a full service B2B customer service platform

Update 8/27: The Polaris Dawn launch has been pushed back a day and is now planned for Wednesday, August 28 after a helium leak was detected ahead of its takeoff.…

Polaris Dawn will push the limits of SpaceX’s human spaceflight program — here’s how to watch it launch live

Pryzm announced its $2 million pre-seed round, led by XYZ Venture Capital and Amplify.LA.

Pryzm is a new kind of defense tech startup: One that helps others win lucrative contracts

Comun, a digital bank focused on serving immigrants in the United States, has raised $21.5 million in a Series A funding round less than nine months after announcing a $4.5…

Fast-growing immigrant-focused neobank Comun has secured $21.5M in new funding just months after its last raise

Calm is rolling out a suite of new features to make it easier for people to fit mindfulness into their lives. Most notably, the app is launching “Taptivities,” which are…

Calm’s new Story-like mindfulness exercises offer an alternative to social media

The NotePin, which hits preorder Wednesday, is $169 and comes with a free starter plan or a Pro Plan, which costs $79 per year.

Plaud takes a crack at a simpler AI pin

CoinSwitch, a prominent Indian cryptocurrency exchange, is suing rival platform WazirX to recover trapped funds.

CoinSwitch sues WazirX to recover trapped funds

Web browser and search startup Brave has laid off 27 employees across the different departments, TechCrunch has learned. The company confirmed the layoffs but didn’t give more details about the…

Brave lays off 27 employees

Zepto co-founder Aadit Palicha told a group of analysts and investors on Tuesday that the three-year-old Indian delivery startup anticipates growth of 150% in the next 12 months, a remarkable…

Zepto, snagging $1B in 90 days, projects 150% annual growth