AI

Cloudflare launches a tool to combat AI bots

Comment

grey robot head on red background
Image Credits: Getty Images

Cloudflare, the publicly traded cloud service provider, has launched a new, free tool to prevent bots from scraping websites hosted on its platform for data to train AI models.

Some AI vendors, including Google, OpenAI and Apple, allow website owners to block the bots they use for data scraping and model training by amending their site’s robots.txt, the text file that tells bots which pages they can access on a website. But, as Cloudflare points out in a post announcing its bot-combating tool, not all AI scrapers respect this.

“Customers don’t want AI bots visiting their websites, and especially those that do so dishonestly,” the company writes on its official blog. “We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection.”

So, in an attempt to address the problem, Cloudflare analyzed AI bot and crawler traffic to fine-tune automatic bot detection models. The models consider, among other factors, whether an AI bot might be trying to evade detection by mimicking the appearance and behavior of someone using a web browser.

“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint,” Cloudflare writes. “Based on these signals, our models [are] able to appropriately flag traffic from evasive AI bots as bots.”

Cloudflare has set up a form for hosts to report suspected AI bots and crawlers and says that it’ll continue to manually blacklist AI bots over time.

The problem of AI bots has come into sharp relief as the generative AI boom fuels the demand for model training data.

Many sites, wary of AI vendors training models on their content without alerting or compensating them, have opted to block AI scrapers and crawlers. Around 26% of the top 1,000 sites on the web have blocked OpenAI’s bot, according to one study; another found that more than 600 news publishers had blocked the bot.

Blocking isn’t a surefire protection, however. As alluded to earlier, some vendors appear to be ignoring standard bot exclusion rules to gain a competitive advantage in the AI race. AI search engine Perplexity was recently accused of impersonating legitimate visitors to scrape content from websites, and OpenAI and Anthropic are said to have at times ignored robots.txt rules.

In a letter to publishers last month, content licensing startup TollBit said that, in fact, it sees “many AI agents” ignoring the robots.txt standard.

Tools like Cloudflare’s could help — but only if they prove to be accurate in detecting clandestine AI bots. And they won’t solve the more intractable problem of publishers risking sacrificing referral traffic from AI tools like Google’s AI Overviews, which exclude sites from inclusion if they block specific AI crawlers.

More TechCrunch

Pindrop builds deepfake-combatting and multi-factor authentication products targeting businesses in banking, finance and related industries.

Deepfake-detecting firm Pindrop lands $100M loan to grow its offerings

Only a few days left — until July 19 at 11:59 p.m. PT — to save up to $800 on ticket prices for TechCrunch Disrupt 2024 through this week’s Disrupt…

Halfway through Disrupt Deal Days: Save big on TechCrunch Disrupt 2024!

Software as a service (SaaS) is an ever-evolving industry. We’ll talk to some of the sharpest minds and professionals in the industry — executives from early- and late-stage SaaS companies,…

Announcing the agenda for the SaaS Stage at TechCrunch Disrupt 2024

There had been rumors the U.K.’s shiny new-in-post Labour government would commit to introducing a dedicated artificial intelligence bill on Wednesday as it unveiled its full legislation program amid the…

On AI, new UK gov’t to work on ‘appropriate’ rules for ‘most powerful’ models and beef up product safety powers

Supply chain management remains a stubborn problem for many mid-market companies, who can’t afford SAP or lack sufficient IT resources to manage a complex program. Didero, an early stage startup,…

Didero is using AI to solve supply chain management at mid-market companies

Tricentis, the well-funded test automation platform that helps developers find bugs in their code (now with the help of AI, of course), today announced that it has acquired SeaLights, a…

Test automation platform Tricentis acquires SeaLights

A new app called The Way is aiming to help people explore the deeper side of meditation through a single, structured path guided by an authorized Zen Master. Founded by…

The Way app offers a chance to meditate alongside a Zen Master

Menlo Ventures and Anthropic are teaming up on a $100 million fund dubbed “the Anthology Fund” to invest in pre-seed and Series A AI startups.

Menlo Ventures and Anthropic team up on a $100M AI fund

Whether it’s an online marketplace, store, or social media platform, virtually every site today uses some kind of recommendation service to personalize its offerings. Shaped, which is announcing an $8…

Shaped raises $8M Series A and launches its self-serve recommendations and search service

To say that Pix, the instant payment system created by the Central Bank of Brazil, has been a resounding success is an understatement. With Pix, money moves directly between core…

Matera raises $100M from Warburg Pincus to help the US catch up to Brazil in instant payments

At a time when many startups have struggled to raise money and keep their heads above water, Kandji, an Apple device management platform, has been an exception. Founded in 2019,…

Kandji raises another $100M for Apple device management as valuation rises to $850M

Can generative AI substitute for having a social graph? California-based local experiences discovery startup Bigfoot is hoping the addition of a conversational interface to its weekend planner website — in…

Local experience discovery startup, Bigfoot, adds GenAI to fast-track weekend planning

Tinder revealed last year that it was testing a photo-selection feature that uses AI to help users choose the best photos for their dating profiles. Now dubbed “Photo Selector,” the…

Tinder’s AI Photo Selector automatically picks the best photos for your dating profile 

The warrant is part of an investigation into alleged stock market manipulation related to a high-profile bidding war for music label, SM Entertainment.

Korean prosecutors file warrant to arrest Kakao founder for stock manipulation

Instagram just announced that it will allow users to add multiple audio tracks to their Reels worldwide. The company noted that users can add up to 20 tracks in a…

Instagram now allows users to add multiple audio tracks to Reels

Wittaya Aqua enables seafood farmers to consolidate existing data points across the seafood supply chain.

Wittaya Aqua’s data-driven AI helps seafood farmers increase aquaculture production

Marc Andreessen, the co-founder of one of the most prominent venture capital firms in Silicon Valley, says he’s been a Democrat most of his life. He says he has endorsed…

Andreessen Horowitz co-founders explain why they’re supporting Trump

Fisker has been given the green light by a bankruptcy judge to sell more than 3,000 of its Ocean SUVs to a vehicle leasing company, a deal that will net…

Fisker cleared to sell North American EVs for $46.25 million

Elon Musk is doubling down on his commitment to Texas by vowing to move SpaceX’s massive headquarters from its long-time Hawthorne, California home to the Lone Star State.  Musk later…

Elon Musk vows to move X, SpaceX headquarters from California to Texas 

Featured Article

The biggest data breaches in 2024: 1 billion stolen records and rising

Some of the largest, most damaging breaches of 2024 already account for over a billion stolen records.

The biggest data breaches in 2024: 1 billion stolen records and rising

Andrej Karpathy, former head of AI at Tesla and researcher at OpenAI, is launching Eureka Labs, an “AI native” education platform. In tech speak, that usually means built from the…

After Tesla and OpenAI, Andrej Karpathy’s startup aims to apply AI assistants to education

Apple initially added a new flashlight UI in iOS 18’s third developer beta, and with iOS 18 now available in public beta, you can try one of the most underrated…

With the latest iOS 18 developer beta, Apple makes flashlight UI more fun

Featured Article

Hacked, leaked, exposed: Why you should never use stalkerware apps

Using stalkerware is creepy, unethical, potentially illegal, and puts your data and that of your loved ones in danger.

Hacked, leaked, exposed: Why you should never use stalkerware apps

Welcome to TechCrunch Fintech! This week, we’re looking at Sequoia Capital’s effort to give its LPs liquidity on the firm’s investments in Stripe, how LatAm fintechs are still catching investors’…

Sequoia bets big on Stripe, LatAm fintechs clean up and one African startup’s outsized Series A

Anthropic launched its Claude Android app on Tuesday to bring its AI chatbot to more users. This is Anthropic’s latest effort to convince users to ditch ChatGPT by making Claude…

Anthropic releases Claude app for Android

On the first night of the RNC, venture capitalist David Sacks took the stage to warn Republicans of “a world on fire.” 

VC David Sacks delivers a fire-and-brimstone speech at the Republican National Convention

Apple’s changes may affect apps that today have an estimated $393 million in revenue and have been downloaded roughly 58 million times over the past year.

iOS 18 could ‘sherlock’ $400M in app revenue

WhatsApp is rolling out a “Favorites” filter to let you quickly access chats and groups for sending them new messages or making calls.

WhatsApp introduces ‘Favorites’ for quick access to contacts and groups that matter most

As AI competition heats up, Perplexity has proven resilient due to its focus on using the technology strictly as a tool to let people “learn anything in their own way.”…

Perplexity’s Aravind Srinivas on accelerating everyday AI at TechCrunch Disrupt 2024

Echo Chunk, a company that is building Wordle-styled daily chess puzzle game Echo Chess, has raised $1.4 million in pre-seed from a16z Speedrun (Andreessen Horowitz’s early-stage games accelerator), founder of…

A company building Wordle for chess raises money from a16z Speedrun, Mark Pincus and Eric Wu