Featured Article

The Great Pretender

AI doesn’t know the answer, and it hasn’t learned how to care

Kommentar

Hands holding a mask of anonymity. Polygonal design of interconnected elements.
Image Credits: llya Lukichev / Getty / Getty Images

There is a good reason not to trust what today’s AI constructs tell you, and it has nothing to do with the fundamental nature of intelligence or humanity, with Wittgensteinian concepts of language representation, or even disinfo in the dataset. All that matters is that these systems do not distinguish between something that is correct and something that looks correct. Once you understand that the AI considers these things more or less interchangeable, everything makes a lot more sense.

Now, I don’t mean to short circuit any of the fascinating and wide-ranging discussions about this happening continually across every form of media and conversation. We have everyone from philosophers and linguists to engineers and hackers to bartenders and firefighters questioning and debating what “intelligence” and “language” truly are, and whether something like ChatGPT possesses them.

This is amazing! And I’ve learned a lot already as some of the smartest people in this space enjoy their moment in the sun, while from the mouths of comparative babes come fresh new perspectives.

But at the same time, it’s a lot to sort through over a beer or coffee when someone asks “what about all this GPT stuff, kind of scary how smart AI is getting, right?” Where do you start — with Aristotle, the mechanical Turk, the perceptron or “Attention is all you need”?

During one of these chats I hit on a simple approach that I’ve found helps people get why these systems can be both really cool and also totally untrustable, while subtracting not at all from their usefulness in some domains and the amazing conversations being had around them. I thought I’d share it in case you find the perspective useful when talking about this with other curious, skeptical people who nevertheless don’t want to hear about vectors or matrices.

There are only three things to understand, which lead to a natural conclusion:

  1. These models are created by having them observe the relationships between words and sentences and so on in an enormous dataset of text, then build their own internal statistical map of how all these millions and millions of words and concepts are associated and correlated. No one has said, this is a noun, this is a verb, this is a recipe, this is a rhetorical device; but these are things that show up naturally in patterns of usage.
  2. These models are not specifically taught how to answer questions, in contrast to the familiar software companies like Google and Apple have been calling AI for the last decade. Those are basically Mad Libs with the blanks leading to APIs: Every question is either accounted for or produces a generic response. With large language models the question is just a series of words like any other.
  3. These models have a fundamental expressive quality of “confidence” in their responses. In a simple example of a cat recognition AI, it would go from 0, meaning completely sure that’s not a cat, to 100, meaning absolutely sure that’s a cat. You can tell it to say “yes, it’s a cat” if it’s at a confidence of 85, or 90, whatever produces your preferred response metric.

So given what we know about how the model works, here’s the crucial question: What is it confident about? It doesn’t know what a cat or a question is, only statistical relationships found between data nodes in a training set. A minor tweak would have the cat detector equally confident the picture showed a cow, or the sky, or a still life painting. The model can’t be confident in its own “knowledge” because it has no way of actually evaluating the content of the data it has been trained on.

The AI is expressing how sure it is that its answer appears correct to the user.

This is true of the cat detector, and it is true of GPT-4 — the difference is a matter of the length and complexity of the output. The AI cannot distinguish between a right and wrong answer — it only can make a prediction of how likely a series of words is to be accepted as correct. That is why it must be considered the world’s most comprehensively informed bullshitter rather than an authority on any subject. It doesn’t even know it’s bullshitting you — it has been trained to produce a response that statistically resembles a correct answer, and it will say anything to improve that resemblance.

The AI doesn’t know the answer to any question, because it doesn’t understand the question. It doesn’t know what questions are. It doesn’t “know” anything! The answer follows the question because, extrapolating from its statistical analysis, that series of words is the most likely to follow the previous series of words. Whether those words refer to real places, people, locations, etc. is not material — only that they are like real ones.

It’s the same reason AI can produce a Monet-like painting that isn’t a Monet — all that matters is it has all the characteristics that cause people to identify a piece of artwork as his. Today’s AI approximates factual responses the way it would approximate “Water Lilies.”

Now, I hasten to add that this isn’t an original or groundbreaking concept — it’s basically another way to explain the stochastic parrot, or the undersea octopus. Those problems were identified very early by very smart people and represent a great reason to read commentary on tech matters widely.

Ethicists fire back at ‘AI Pause’ letter they say ‘ignores the actual harms’

But in the context of today’s chatbot systems, I’ve just found that people intuitively get this approach: The models don’t understand facts or concepts, but relationships between words, and its responses are an “artist’s impression” of an answer. Their goal, when you get down to it, is to fill in the blank convincingly, not correctly. This is the reason why its responses fundamentally cannot be trusted.

Of course sometimes, even a lot of the time, its answer is correct! And that isn’t an accident: For many questions, the answer that looks the most correct is the correct answer. That is what makes these models so powerful — and dangerous. There is so, so much you can extract from a systematic study of millions of words and documents. And unlike recreating “Water Lilies” exactly, there’s a flexibility to language that lets an approximation of a factual response also be factual — but also make a totally or partially invented response appear equally or more so. The only thing the AI cares about is that the answer scans right.

This leaves the door open to discussions around whether this is truly knowledge, what if anything the models “understand,” if they have achieved some form of intelligence, what intelligence even is and so on. Bring on the Wittgenstein!

Furthermore, it also leaves open the possibility of using these tools in situations where truth isn’t really a concern. If you want to generate five variants of an opening paragraph to get around writer’s block, an AI might be indispensable. If you want to make up a story about two endangered animals, or write a sonnet about Pokémon, go for it. As long as it is not crucial that the response reflects reality, a large language model is a willing and able partner — and not coincidentally, that’s where people seem to be having the most fun with it.

Where and when AI gets it wrong is very, very difficult to predict because the models are too large and opaque. Imagine a card catalog the size of a continent, organized and updated over a period of a hundred years by robots, from first principles that they came up with on the fly. You think you can just walk in and understand the system? It gives a right answer to a difficult question and a wrong answer to an easy one. Why? Right now that is one question that neither AI nor its creators can answer.

This may well change in the future, perhaps even the near future. Everything is moving so quickly and unpredictably that nothing is certain. But for the present this is a useful mental model to keep in mind: The AI wants you to believe it and will say anything to improve its chances.

More TechCrunch

Tags

Whatever size the tranche ends up being it’ll be OpenAI’s biggest outside infusion of capital since January 2023.

OpenAI reportedly in talks to close a new funding round at $100B+ valuation

Reddit’s mobile and web applications went down on Wednesday afternoon, with more than 150,000 users reporting outages on Downdetector as of 1:30 p.m. in San Francisco. When trying to access…

Reddit back online after a software update took it down

For months, a tech forum ran wild asking if the Converge 2 accelerator program actually happened. We finally found out.

OpenAI’s Converge 2 program has been shrouded in mystery

Bluesky on Wednesday introduced the ability to hide replies, as well as a way to detach your original post from someone’s quote post.

Bluesky adds ‘anti-toxicity’ tools and aims to integrate ‘a Community Notes-like’ feature in the future

Featured Article

Fluid Truck’s board ousted its sibling co-founders amid allegations of mismanaging funds

Fluid Truck, a startup that was founded to disrupt the commercial vehicle rental industry, has ousted its sibling co-founders — CEO James Eberhard and chief legal counsel Jenifer Snyder — according to sources familiar with the matter. The shakeup, which employees have described as a hostile takeover, was led by…

Fluid Truck’s board ousted its sibling co-founders amid allegations of mismanaging funds

Meta announced Wednesday that users on Threads will be able to see fediverse replies on other posts besides their own.

Threads deepens its ties to the open social web, aka the ‘fediverse’

Just weeks ago, during an interview with TechCrunch, Thomas Ingenlath laid out his plan to turn Polestar into a self-sustaining company. Now, he’s out.  Polestar said Tuesday Ingenlath has resigned as…

Polestar is getting a new CEO amid EV sales slump

Midjourney, the AI image-generating platform that’s reportedly raking in more than $200 million in revenue without any VC investment, is getting into hardware. The company made the announcement in a…

Midjourney says it’s ‘getting into hardware’

Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, sign up here. Say what you will about generative AI. But it’s commoditizing…

This Week in AI: AI is rapidly being commoditized

OpenSea, which calls itself the “world’s largest” nonfungible token (NFT) marketplace, received a Wells notice from the SEC, the company said in a blog post Wednesday, indicating the regulator may…

SEC takes aim at NFT marketplace OpenSea

Kissner previously served as Twitter’s chief information security officer, and held senior security and privacy positions at Apple, Google, and Lacework.

Ex-Twitter CISO Lea Kissner appointed as LinkedIn security chief

Featured Article

A comprehensive list of 2024 tech layoffs

A complete list of all the known layoffs in tech, from Big Tech to startups, broken down by month throughout 2024.

A comprehensive list of 2024 tech layoffs

It’s been more than a year since Tesla agreed to open its Supercharger network to electric vehicles from other automakers, like General Motors and Ford. But Tesla’s network of nearly…

Tesla’s Supercharger network is still unavailable to non-Tesla EVs

Tumblr is making the move to WordPress. After its 2019 acquisition by WordPress.com parent company Automattic in a $3 million fire sale, the new owner has focused on improving Tumblr’s…

Tumblr to move its half a billion blogs to WordPress

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show an anachronistic…

Google says it’s fixed Gemini’s people-generating feature

Exclusive: Millennium Space Systems will soon have a new CEO as Jason Kim has departed the company, TechCrunch has learned. 

The CEO of Boeing’s satellite maker, Millennium Space, has quietly left the company

As of the company’s most recent financial quarter, Apple’s Services bsuiness represented about one-quarter of the tech giant’s revenue.

Apple reportedly cuts 100 jobs working on Books and other services

After a long week of coding, you might assume San Francisco’s builders would retreat into the Bay Area’s mountains, beaches or vibrant clubbing scene. But in reality, when the week…

Born from San Francisco’s AI hackathons, Agency lets you see what your AI agents do

You’ve got the product — now how do you find customers? And once you find those customers, how do you keep them coming back for more? At TechCrunch Disrupt 2024,…

VCs and founders talk finding (and keeping) product-market fit at TechCrunch Disrupt 2024

Snapchat announced on Wednesday that it’s releasing new resources for educators to help them create safe environments in their schools by better understanding how their students use the app. The…

Snapchat releases new teen safety resources for educators

Marty Kausas, Pylon’s CEO and co-founder, says they quickly learned that the omnichannel approach the company originally took was just a first step, and customers were clamoring for more.

Pylon lands $17M investment to build a full service B2B customer service platform

Update 8/27: The Polaris Dawn launch has been pushed back a day and is now planned for Wednesday, August 28 after a helium leak was detected ahead of its takeoff.…

Polaris Dawn will push the limits of SpaceX’s human spaceflight program — here’s how to watch it launch live

Pryzm announced its $2 million pre-seed round, led by XYZ Venture Capital and Amplify.LA.

Pryzm is a new kind of defense tech startup: One that helps others win lucrative contracts

Comun, a digital bank focused on serving immigrants in the United States, has raised $21.5 million in a Series A funding round less than nine months after announcing a $4.5…

Fast-growing immigrant-focused neobank Comun has secured $21.5M in new funding just months after its last raise

Calm is rolling out a suite of new features to make it easier for people to fit mindfulness into their lives. Most notably, the app is launching “Taptivities,” which are…

Calm’s new Story-like mindfulness exercises offer an alternative to social media

The NotePin, which hits preorder Wednesday, is $169 and comes with a free starter plan or a Pro Plan, which costs $79 per year.

Plaud takes a crack at a simpler AI pin

CoinSwitch, a prominent Indian cryptocurrency exchange, is suing rival platform WazirX to recover trapped funds.

CoinSwitch sues WazirX to recover trapped funds

Web browser and search startup Brave has laid off 27 employees across the different departments, TechCrunch has learned. The company confirmed the layoffs but didn’t give more details about the…

Brave lays off 27 employees

Zepto co-founder Aadit Palicha told a group of analysts and investors on Tuesday that the three-year-old Indian delivery startup anticipates growth of 150% in the next 12 months, a remarkable…

Zepto, snagging $1B in 90 days, projects 150% annual growth

VerSe Innovation, India’s content tech startup, has acquired digital marketing firm Valueleaf Group to bolster its presence in the Indian digital ad space.

India’s VerSe buys Valueleaf to boost digital marketing