OpenAI breach is a reminder that AI companies are treasure troves for hackers

12:49 PM PDT • July 5, 2024

Image Credits: Bryce Durbin / TechCrunch

There’s no need to worry that your secret ChatGPT conversations were obtained in a recently reported breach of OpenAI’s systems. The hack itself, while troubling, appears to have been superficial — but it’s a reminder that AI companies have in short order made themselves into one of the juiciest targets out there for hackers.

The New York Times reported the hack in more detail after former OpenAI employee Leopold Aschenbrenner hinted at it recently in a podcast. He called it a “major security incident,” but unnamed company sources told the Times the hacker only got access to an employee discussion forum. (I reached out to OpenAI for confirmation and comment.)

No security breach should really be treated as trivial, and eavesdropping on internal OpenAI development talk certainly has its value. But it’s far from a hacker getting access to internal systems, models in progress, secret roadmaps, and so on.

But it should scare us anyway, and not necessarily because of the threat of China or other adversaries overtaking us in the AI arms race. The simple fact is that these AI companies have become gatekeepers to a tremendous amount of very valuable data.

Let’s talk about three kinds of data OpenAI and, to a lesser extent, other AI companies created or have access to: high-quality training data, bulk user interactions, and customer data.

It’s uncertain what training data exactly they have, because the companies are incredibly secretive about their hoards. But it’s a mistake to think they are just big piles of scraped web data. Yes, they do use web scrapers or datasets like the Pile, but it’s a gargantuan task shaping that raw data into something that can be used to train a model like GPT-4o. A huge amount of human work hours are required to do this — it can only be partially automated.

AI training data has a price tag that only Big Tech can afford

Some machine learning engineers have speculated that of all the factors going into the creation of a large language model (or, perhaps, any transformer-based system), the single most important one is dataset quality. That’s why a model trained on Twitter and Reddit will never be as eloquent as one trained on every published work of the last century. (And probably why OpenAI reportedly used questionably legal sources like copyrighted books in their training data, a practice they claim to have given up.)

So the training datasets OpenAI has built are of tremendous value to competitors, from other companies to adversary states to regulators here in the U.S. Wouldn’t the Federal Trade Commission (FTC) or courts like to know exactly what data was being used, and whether OpenAI has been truthful about that?

But perhaps even more valuable is OpenAI’s enormous trove of user data — probably billions of conversations with ChatGPT on hundreds of thousands of topics. Just as search data was once the key to understanding the collective psyche of the web, ChatGPT has its finger on the pulse of a population that may not be as broad as the universe of Google users, but provides far more depth. (In case you weren’t aware, unless you opt out, your conversations are being used for training data.)

AI-powered scams and what you can do about them

In the case of Google, an uptick in searches for “air conditioners” tells you the market is heating up a bit. But those users don’t then have a whole conversation about what they want, how much money they’re willing to spend, what their home is like, manufacturers they want to avoid, and so on. You know this is valuable because Google is itself trying to convert its users to provide this very information by substituting AI interactions for searches!

Think of how many conversations people have had with ChatGPT, and how useful that information is, not just to developers of AIs, but also to marketing teams, consultants, analysts … It’s a gold mine.

The last category of data is perhaps of the highest value on the open market: how customers are actually using AI, and the data they have themselves fed to the models.

Hundreds of major companies and countless smaller ones use tools like OpenAI and Anthropic’s APIs for an equally large variety of tasks. And in order for a language model to be useful to them, it usually must be fine-tuned on or otherwise given access to their own internal databases.

This might be something as prosaic as old budget sheets or personnel records (e.g., to make them more easily searchable) or as valuable as code for an unreleased piece of software. What they do with the AI’s capabilities (and whether they’re actually useful) is their business, but the simple fact is that the AI provider has privileged access, just as any other SaaS product does.

These are industrial secrets, and AI companies are suddenly right at the heart of a great deal of them. The newness of this side of the industry carries with it a special risk in that AI processes are simply not yet standardized or fully understood.

Hugging Face says it detected ‘unauthorized access’ to its AI model hosting platform

Like any SaaS provider, AI companies are perfectly capable of providing industry standard levels of security, privacy, on-premises options, and generally speaking providing their service responsibly. I have no doubt that the private databases and API calls of OpenAI’s Fortune 500 customers are locked down very tightly! They must certainly be as aware or more of the risks inherent in handling confidential data in the context of AI. (The fact that OpenAI did not report this attack is their choice to make, but it doesn’t inspire trust for a company that desperately needs it.)

But good security practices don’t change the value of what they are meant to protect, or the fact that malicious actors and sundry adversaries are clawing at the door to get in. Security isn’t just picking the right settings or keeping your software updated — though of course the basics are important too. It’s a never-ending cat-and-mouse game that is, ironically, now being supercharged by AI itself: Agents and attack automators are probing every nook and cranny of these companies’ attack surfaces.

There’s no reason to panic — companies with access to lots of personal or commercially valuable data have faced and managed similar risks for years. But AI companies represent a newer, younger, and potentially juicier target than your garden-variety, poorly configured enterprise server or irresponsible data broker. Even a hack like the one reported above, with no serious exfiltrations that we know of, should worry anybody who does business with AI companies. They’ve painted the targets on their backs. Don’t be surprised when anyone, or everyone, takes a shot.

AI aids nation-state hackers but also helps US spies to find them, says NSA cyber director

More TechCrunch

Presti uses generative AI to improve product photography in the furniture industry

Romain Dillet

4 mins ago

If you’ve ever bought a sofa on an online store, have you thought about the homes that you can see in the background? When it’s time to release a new…

Presti uses generative AI to improve product photography in the furniture industry

Startups

Google backs Indian open-source Uber rival

Manish Singh

1 hour ago

Google has joined investors backing Moving Tech, the parent firm of open-source ride-sharing app Namma Yatri in India that is eroding market share from Uber and Ola with its no-commission…

Google backs Indian open-source Uber rival

Apps

At last, Apple’s Messages app will support RCS and scheduling texts

Sarah Perez

2 hours ago

These messaging features, announced at WWDC 2024, will have a significant impact on how people communicate every day.

At last, Apple’s Messages app will support RCS and scheduling texts

Apps

Here are all the devices compatible with iOS 18

Lauren Forristal

7 hours ago

iOS 18 will be available in the fall as a free software update.

Here are all the devices compatible with iOS 18

Commerce

TikTok glitch allows Shop to appear to users under 18, despite adults-only policy

Sarah Perez

7 hours ago

The tests indicate there are loopholes in TikTok’s ability to apply its parental controls and policies effectively in a situation where the teen user originally lied about their age, as…

TikTok glitch allows Shop to appear to users under 18, despite adults-only policy

Startups

Lhoopa raises $80M to spur more affordable housing in the Philippines

Jagmeet Singh

7 hours ago

Lhoopa has raised $80 million to address the lack of affordable housing in Southeast Asian markets, starting with the Philippines.

Lhoopa raises $80M to spur more affordable housing in the Philippines

Venture

Trump’s VP candidate JD Vance has long ties to Silicon Valley, and was a VC himself

Marina Temkin

7 hours ago

Former President Donald Trump picked Ohio Senator J.D. Vance as his running mate on Monday, as he runs to reclaim the office he lost to President Joe Biden in 2020.…

Trump’s VP candidate JD Vance has long ties to Silicon Valley, and was a VC himself

Space

TechCrunch Space: Space cowboys

Aria Alamalhodaei

9 hours ago

Hello and welcome back to TechCrunch Space. Is it just me, or is the news cycle only accelerating this summer?!

Apps

Without Apple Intelligence, iOS 18 beta feels like a TV show that’s waiting for the finale

Ivan Mehta

9 hours ago

Apple Intelligence features are not available in the developer beta, which is out now.

Without Apple Intelligence, iOS 18 beta feels like a TV show that’s waiting for the finale

Apps

Apple’s public betas for iOS 18 are here to test out

Maxwell Zeff

9 hours ago

Apple released the public betas for its next generation of software on the iPhone, Mac, iPad and Apple Watch on Monday. You can now test out iOS 18 and many…

Apple’s public betas for iOS 18 are here to test out

Transportation

Fisker has one major objector to its Ocean SUV fire sale

Sean O'Kane

11 hours ago

One major dissenter threatens to upend Fisker’s apparent best chance at offloading its unsold EVs, a deal that would keep the startup’s bankruptcy proceeding alive and pave the way for…

Fisker has one major objector to its Ocean SUV fire sale

Venture

Major Stripe investor Sequoia confirms $70B valuation, offers its investors a payday

Mary Ann Azevedo

12 hours ago

Payments giant Stripe has delayed going public for so long that its major investor Sequoia Capital is getting creative to offer returns to its limited partners. The venture firm emailed…

Major Stripe investor Sequoia confirms $70B valuation, offers its investors a payday

Security

Google’s Kurian approached Wiz, $23B deal could take a week to land, source says

Ingrid Lunden

Marina Temkin

12 hours ago

Alphabet, Google’s parent company, is in advanced talks to acquire Wiz for $23 billion, a person close to the company told TechCrunch. The deal discussions were previously reported by The…

Google’s Kurian approached Wiz, $23B deal could take a week to land, source says

Hardware

Bird Buddy’s new AI feature lets people name and identify individual birds

Brian Heater

12 hours ago

Name That Bird determines individual members of a species by identifying distinguishing characteristics that most humans would be hard-pressed to spot.

Bird Buddy’s new AI feature lets people name and identify individual birds

Apps

YouTube Music is testing an AI-generated radio feature and adding a song recognition tool

Aisha Malik

12 hours ago

YouTube Music is introducing two new ways to boost song discovery on its platform. YouTube announced on Monday that it’s experimenting with an AI-generated conversational radio feature, and rolling out…

Transportation

Elon Musk confirms Tesla ‘robotaxi’ event delayed due to design change

Sean O'Kane

13 hours ago

Tesla had internally planned to build the dedicated robotaxi and the $25,000 car, often referred to as the Model 2, on the same platform.

Elon Musk confirms Tesla ‘robotaxi’ event delayed due to design change

Space

Moon cave! Discovery could redirect lunar colony and startup plays

Devin Coldewey

13 hours ago

What this means for the space industry is that theory has become reality: The possibility of designing a habitation within a lunar tunnel is a reasonable proposition.

Moon cave! Discovery could redirect lunar colony and startup plays

TechCrunch Disrupt 2024

Disrupt Deal Days are here: Prime savings for TechCrunch Disrupt 2024!

TechCrunch Events

16 hours ago

Get ready for a prime week of savings at TechCrunch Disrupt 2024 with the launch of Disrupt Deal Days! From now to July 19 at 11:59 p.m. PT, we’re going…

Disrupt Deal Days are here: Prime savings for TechCrunch Disrupt 2024!

Apps

Deezer chases Spotify and Amazon Music with its own AI playlist generator

Aisha Malik

16 hours ago

Deezer is the latest music streaming app to introduce an AI playlist feature. The company announced on Monday that a select number of paid users will be able to create…

Deezer chases Spotify and Amazon Music with its own AI playlist generator

Fintech

Caliza lands $8.5 million to bring real-time money transfers to Latin America using USDC

Anna Heim

18 hours ago

Real-time payments are becoming commonplace for individuals and businesses, but not yet for cross-border transactions. That’s what Caliza is hoping to change, starting with Latin America. Founded in 2021 by…

Caliza lands $8.5 million to bring real-time money transfers to Latin America using USDC

Adaptive builds automation tools to speed up construction payments

Kyle Wiggers

18 hours ago

Adaptive is a platform that provides tools designed to simplify payments and accounting for general construction contractors.

Adaptive builds automation tools to speed up construction payments

Transportation

How VanMoof’s new owners plan to win over its old customers

Rebecca Bellan

22 hours ago

When VanMoof declared bankruptcy last year, it left around 5,000 customers who had preordered e-bikes in the lurch. Now VanMoof is up and running under new management, and the company’s…

How VanMoof’s new owners plan to win over its old customers

Climate

Mitti Labs aims to make rice farming less harmful to the climate, starting in India

Jagmeet Singh

1 day ago

Mitti Labs aims to transform rice farming in India and other South Asian markets by reducing methane emissions by 50% and water consumption by 30%.

Mitti Labs aims to make rice farming less harmful to the climate, starting in India

Security

How to tell if your online accounts have been hacked

Lorenzo Franceschi-Bicchierai

1 day ago

This is a guide on how to check whether someone compromised your online accounts.

How to tell if your online accounts have been hacked

The AI financial results paradox

Ron Miller

1 day ago

There is a general consensus today that generative AI is going to transform business in a profound way, and companies and individuals who don’t get on board will be quickly…

Security

Google reportedly in talks to acquire cloud security company Wiz for $23B

Anthony Ha

2 days ago

Google’s parent company Alphabet might be on the verge of making its biggest acquisition ever. The Wall Street Journal reports that Alphabet is in advanced talks to acquire Wiz for…

Google reportedly in talks to acquire cloud security company Wiz for $23B

Featured Article

Hank Green reckons with the power — and the powerlessness — of the creator

Hank Green has had a while to think about how social media has changed us. He started making YouTube videos in 2007 with his brother, novelist John Green, at a time when the first iPhone was in development, Myspace was still relevant and Instagram didn’t exist. Seventeen years later, posting…

Amanda Silberling

2 days ago

Hank Green reckons with the power — and the powerlessness — of the creator

Fintech

Synapse’s collapse has frozen nearly $160M from fintech users — here’s how it happened

Mary Ann Azevedo

2 days ago

Here is a timeline of Synapse’s troubles and the ongoing impact it is having on banking consumers.

Synapse’s collapse has frozen nearly $160M from fintech users — here’s how it happened

Featured Article

Helixx wants to bring fast-food economics and Netflix pricing to EVs

When Helixx co-founder and CEO Steve Pegg looks at Daisy — the startup’s 3D-printed prototype delivery van — he sees a second chance. And he’s pulling inspiration from McDonald’s to get there. The prototype, which made its global debut this week at the Goodwood Festival of Speed, is an interesting proof…

Tim Stevens

2 days ago