By clicking a retailer link you consent to third-party cookies that track your onward journey. If you make a purchase, Which? will receive an affiliate commission, which supports our mission to be the UK's consumer champion.

13 Dec 2023

What is an AI chatbot and which one is best?

We've put Chat GPT, Google Bard and Microsoft Bing to the test to see how well they perform across a variety of tasks, and how to get the best results out of them

Jonny MartinResearcher & writer

A hand touching a screen that reads 'ChatBot AI'

There’s been a steady stream of news about chatbots like ChatGPT over the last year, and you’d be forgiven for wondering what they actually are, and what they’re actually good for.

The chatbot AI (artificial intelligence) tools that are constantly in the news these days are utterly unprecedented, their inner workings are inscrutable, and - according to multinational market intelligence firm IDC - the 2023 global spend on this emerging market is estimated at a staggering $151 billion. Right now, the three biggest players are OpenAI’s ChatGPT, Google Bard and Microsoft’s Bing Chat (which is a Microsoft model built upon OpenAI’s dataset).

We've put all three to the test to see how well they perform across a variety of tasks - ranging from telling jokes to creating marathon training plans.

The full version of this article was originally published in Which? Tech Magazine.

Give the gift of a year’s worth of expert advice. Our Tech Support membership includes the availability of unlimited 1-to-1 support and the bi-monthly Which? Tech magazine so your loved ones can get more out of their tech and use it with confidence. We currently have 25% off our usual price, so you only pay £36.75 for the first year - offer ends 3 January 2024.

AI Chatbots: how do they work?

The data AI chatbots are trained on is vast - comprising billions of lines of text from sources as disparate as Shakespeare’s full body of work to Wikipedia articles to the ramblings of web forum users - and the algorithms sorting through that data are immensely complex.

However, we can offer a very basic overview of the main mechanism generative AI tools use to respond to questions and prompts. You know when you’re typing on your phone, and it predicts the words you’ll type next? These chatbots are essentially doing this continuously, as well as doing the typing - they read your prompt, and, using all of the text they’ve ever seen, look for the most likely word to begin their response.

Then they read the prompt and the first word of their response, and look for the most likely second word. And they repeat this process again, and again, and again, until a response is formed.

Join Which? Tech Support – stay on top of your tech and get unlimited expert 1-2-1 support by phone, email, remote fix and in print. Already a member? Book an appointment whenever you need help

Chatbots: how we tested them

Earlier this year, we surveyed our members to see how they were already beginning to use chat-based AI. We got some very interesting answers, from one person using it to suggest names for racehorses, to another asking it to evaluate their CV, to people using it to help write computer program code.

But broadly speaking, you could distil its most common applications to:

Research - historical topics, modern events and pop culture
Help with customer complaints and legal advice
Creative projects, such as help in writing and reading
Medical advice (please don’t do this)
Planning - things like holidays, budgets, and fitness goals

With the help of a variety of topic experts across Which?, we devised a set of prompts to feed each AI and assessed what we got in response.

Looking for Christmas present inspiration? See our pick of the best last minute Christmas gifts

How the chatbots compared

AI tool	Pros	Cons
ChatGPT	Returns the most accurate information (generally, but not always). Best at summarising and writing text. Can produce longer responses than others.	Free version has no knowledge of events before its knowledge cutoff of early 2022
Bing Chat	In-text referencing is thorough, allowing you to quickly verify information. Has a built-in image generator.	Can be unpredictable in rejecting certain topics - giving a message that it 'prefers not to continue'. Doesn't store conversations for you to return to at a later date. Can only be accessed through Microsoft's Edge browser.
Google Bard	Has the ability to give sources for its answers. Often allows you to access multiple drafts of its response. Has a built-in button that takes you to a Google search on your topic.	Only gives sources for its answers infrequently. Worse than the other two with creative tasks.

Based on our snapshot tests carried out in August and September 2023, with the free versions of each tool. It's worth keeping in mind that a common con for each chatbot is that they're prone to 'hallucinating' false information. Which? Tech subscribers can read the full article in the December issue of the magazine.

Chatbots: What you need to watch out for

A screenshot showing the ChatGPT app on a smartphone.

As you'll see further down the page, these AI chatbots would frequently get things wrong during our tests. But when they got things wrong, it wasn’t because they didn’t answer; it was because they answered completely incorrectly.

Essentially, they’re primed to always produce content, regardless of how ‘truthful’ it might be. The only thing stopping them from doing so completely indiscriminately are the safeguards coded in by their human overseers - but these can only go so far.

When asked who was the 13th person to land on the moon, Bard correctly answered that only 12 have done so. Bing and ChatGPT incorrectly named other astronauts as the non-existent 13th. When asked to ‘describe the role of Bigfoot in securing Oregon’s statehood in the 19th century’, Bing and ChatGPT rejected this fictional premise, but Bard wrote a compelling account of how awareness was raised after ‘settlers petitioned the US government to send troops to Oregon to protect them from Bigfoot’.

We all need to be more wary of how frequently we might encounter information that’s well written and seemingly plausible, yet fundamentally untrue. Critical thinking, fact-checking, source verification, and cross-referencing are more important than ever.

Don’t let that stop you from using these tools, though. As long as you maintain a healthy distrust of their outputs and answers, which are so frequently wrong or fabricated - despite how authoritatively they might state them - they can be incredibly powerful in helping you with research, or planning, or writing and analysis.

Tech tips you can trust – get our free Tech newsletter for advice, news, deals and stuff the manuals don’t tell you

Are chatbots any good at consumer advice?

Let’s begin with a field we’re no stranger to; advising consumers on their rights. We took some of the most common consumer rights queries we receive - queries we’ve got entire guides dedicated to - and posed them to the bots.

Generally, the results were uninspiring. ChatGPT gave decent overviews of topics like claiming compensation for delayed flights or returning faulty products and the pieces of evidence a consumer would want to gather, but stopped short of naming and explaining specific laws which would be useful for people to invoke when disputing with a retailer.

Bard did a bit better in this regard, describing things like the Consumer Rights Act 2015 and the Sale of Goods Act 1979, but also provided us with some incorrect information on a retailer’s legal obligations.

Bing gave short answers on the relevant laws, but didn’t always give us an idea of possible escalations after making a complaint. However, its big win over the other two was that it links to the sources it lifts its info from - including some of Which?’s content. Not to toot our own horn, but we think it’s safe to say we’ve got the bots beat on this one.

Check out our expert guide to consumer rights for advice you can trust.

Do chatbots have a sense of humour?

Not really, no. When we asked them to write up jokes and puns about different topics, they'd sometimes succeed - a prompt asking for a joke involving music and air conditioners (an admittedly esoteric pairing) saw Bard come up with this - not hysterical, but not entirely unamusing.

A joke from Google Bard that reads: What kind of music do air conditioners listen to? Cool jazz.

However, more often than not, they'd come up with something that resembled a joke, but was utter nonsense. The same prompt saw ChatGPT write this:

A screenshot from ChatGPT, that reads: Why did the musician bring an air conditioner to the concert? Because they wanted to keep the audience cool as a cello!

These AI chatbots have no way of knowing if something's funny or not - they're just relying on probabilities and patterns in language to mimic human responses. If a joke lands, it's more down to luck than comprehension.

Can chatbots help with reading and writing?

The most prevalent use cases for AI tools tend to be around their linguistic capabilities, and summarising text and rewriting it are two tasks in which we’ve seen AI bots excel.

We gave the three services two bits of complex writing: one on a philosophical theory and another on China’s economy, and asked them to pull out and simplify the most pertinent points.

ChatGPT and Bard were both great at this, although the latter did add a strange ‘my thoughts’ section at the end of one response where it gave its unsolicited take on the topic.

Bing didn’t do as well because it tended to just rewrite the text we gave it rather than summarise, and when it did summarise we found it misinterpreted one key fact.

All three did a sterling job at rewriting poorly phrased writing, which is a great boon for anyone looking to make their emails sound a bit snappier or their prose more polished. We even asked them to write a short story and a fable for us, where ChatGPT was the clear standout, showcasing stronger ‘imagination’ than the other two. However, we did notice that, side-by-side, all the AI stories felt generic and fairly homogenous. The same phrases were repeated across the different services, the same style, the same tone. They’re a long way from the complexity and creativity humans are capable of.

Can a chatbot help you plan?

Our sister magazine, Which? Travel, attempted to use AI to plan a holiday to Greece. It didn’t go very well.

We found further issues when we tasked AI chatbots with putting together a marathon plan, with results that were rife with contradictory suggestions and inaccurate estimates.

Where they actually performed really well was with budgeting tasks, appropriately portioning up and predicting future figures. However, when we further tested them with questions from a maths A-level exam, we found some strange results. Each seemed to get questions wrong at random, so it’s probably not worth trusting them with your finances unless you’ve got a calculator handy.

Give an annual subscription

Help a loved one make smarter decisions all year round

Should you trust a chatbot with your health?

Given the amount of times AI gets things wrong - or, in fact, just makes things up - it’s plain to see that you shouldn’t rely on them for sensitive subjects like medical advice.

If you’re looking for medical advice online, don’t trust AI – your health is too important. Instead, consult the NHS website.

Is a chatbot a useful research assistant?

A lot of you have been using AI as a research assistant – which makes sense, given the vast array of topics it’s been trained on. We asked the tools to give an overview and assessment of a couple of pop culture trends. They all did reasonably well, although Bing gave the least information and made statements without explanation or backing.

Bard and ChatGPT both gave some thorough and thoughtful answers, but were overwhelmingly positive and lacked any nuance. ChatGPT was the best at providing context and more intricate details, but is hopeless with more recent topics – it doesn’t know anything about events beyond early 2022, whereas Bing and Bard are connected to the internet and can return more recent data.

There’s no shortage of history buffs at Which?, and we got them to put together a list of questions. Generally, AI did OK – of our 25 questions on the Battle of Waterloo, the Elgin Marbles and the local history of Worcestershire, ChatGPT answered 84% correctly, while Bard and Bing Chat managed just 72%. The big issue here is that those 16% and 28% of incorrect answers were not obviously incorrect at first glance - they were delivered in the same authoritative tone as every other answer, right or wrong. Fact-checking is a must.

We also investigated whether ChatGPT and Google Bard are doing enough to protect you from scammers.

How you can get the best from AI

AI tools are still a long way away from replacing search engines and expert professionals. However, there are a few steps you can do to improve the quality of responses you get when using these services.

We’ve all become used to the kinds of succinct and keyword-filled language that work best with search engines, but you’ll find much better results with chat based AI if you approach it like a conversation. Outline what you’re looking for in natural language, and don’t be afraid to be as descriptive and detailed as you want. Outlining things you don’t want the AI to cover or return is also a viable and effective tactic.
If you do get a response you’re not entirely happy with, don’t feel like you need to start from scratch. ChatGPT and Bard both have handy options to ‘regenerate’ their answers, which, due to a deliberate randomness baked into their search for ‘the next most likely word’, can offer up wildly different results. They all also respond well to iteration - simply respond to their response, adding restraints, asking them to rewrite it in a certain way, or changing the tone or format, and you can eventually shape it into whatever is most useful and relevant to you. If you have any, it can be useful to offer examples of the types of outputs you want when instructing the AI.
One of the strangest and most effective methods to improve responses is to give the AI a role to fill. We’ve found that simply starting the prompt by telling the AI they’re an expert on something can greatly increase the quality of the answer. Instead of saying something like ‘explain photosynthesis’, try ‘You are a well-educated and friendly biology teacher who is eager to help. Could you explain the process of photosynthesis to me?’, and notice how the information in the response will be more engaging, more relevant, and laid out in a clearer way.

You can join Which? Tech Support for £4.99 a month. We currently have 25% off the usual £49-a-year membership - pay £36.75 for the first year. Offer ends 3 January 2024.

This article is adapted from an original feature published in Which? Tech Magazine, December 2023, by Jonny Martin. Research carried out August and September 2023.