'Name, Shame, and Hard Block Them': Cloudflare Blasts Perplexity Over AI Website Scraping

Image Credit: Photo For Everything / Shutterstock.com

In Short

Cloudflare has caught Perplexity scraping websites that explicitly block AI crawlers.
Perplexity's AI crawlers concealed their identity and even used undisclosed IP addresses.
The AI startup was caught doing so across tens of thousands of domains, making millions of requests per day.

Perplexity has been caught red-handed by Cloudflare, as the startup has been sneaking around websites that do not want to be scraped by AI crawlers. Typically, AI answer engines like Perplexity or ChatGPT go through several websites on the internet, and extract data such as text, images, and other content to generate answers, often without obtaining permission.

Cloudflare has now published its research, claiming that Perplexity uses dubious tactics to circumvent restrictions by concealing its identity to scrape websites, despite websites explicitly opting out.

Cloudflare CEO Matthew Prince has blasted Perplexity on X, stating that “Some supposedly “reputable” AI companies act more like North Korean hackers. Time to name, shame, and hard block them.”

Also Read: Perplexity’s Comet AI Browser Launches with a Built-in AI Agent

This, of course, hurts site traffic, which is why some websites have started using the ‘robots.txt’ file to curb AI’s free lunch. This file tells AI crawlers which pages a site wants indexed and which it doesn’t. But according to Cloudflare’s report, Perplexity seems to be completely violating the robots.txt standard.

How Perplexity Pulled Off the Grand Theft Data

Cloudflare published the report after it received several complaints from its customers who claimed that Perplexity still had access to their website’s content, despite having set restrictions in the Robots.txt file, and created Web Application Firewall (WAF) rules to prevent AI bots from scraping data.

In response to the complaints, Cloudflare created test domains with similar restrictions to observe Perplexity’s behavior. They found that Perplexity initially attempts to access sites using its regular crawlers, i.e., “PerplexityBot” or “Perplexity-User.” However, if the AI encounters restrictions, it switches its user agent, the identifier that tells a website what kind of browser and device is being used.

Also Read: OpenAI is Developing a Web Browser to Rival Google Chrome

In Perplexity’s case, it masked itself as a Chrome browser on macOS. Moreover, Perplexity used “rotating” IP addresses that the company does not mention on its list of IP addresses used by its bots. Cloudflare’s report also mentions that Perplexity changes its autonomous system networks (ASNs), which are unique identifiers used to distinguish large networks.

Cloudflare mentions in its post, “This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals.”

Not Perplexity’s First Rodeo

Perplexity was caught doing the same thing in June last year, ignoring paywalls and Robots.txt files on websites. Back then, the company’s CEO, Aravind Srinivas, blamed it all on third-party crawlers the company was relying on. But now, the situation is different, and the blame squarely falls on Perplexity itself.

In a statement to The Verge, Perplexity spokesperson Jesse Dwyer calls Cloudflare’s report a “publicity stunt.” He further adds that “there are a lot of misunderstandings in the blog post.” However, we are still waiting to hear more from Perplexity. Meanwhile, Cloudflare has delisted Perplexity as a verified bot and is rolling out new ways to block Perplexity from crawling websites.

It is also worth pointing out that Apple has been interested in buying Perplexity and was reportedly in early talks. However, following this report, the Cupertino giant may now reconsider its decision.

Character.AI Adds a Social Feed for AI-Generated Content to Its App

Anshuman Jain Aug 5, 2025

Elon Musk is Bringing Back Your Old Favorite Vines

Anshuman Jain Aug 4, 2025

Reddit Wants to Rival Google With Its Own Search Engine

Anshuman Jain Aug 1, 2025

Mark Zuckerberg Says People Without AI Glasses May Fall Behind

Anshuman Jain Jul 31, 2025

#Tags

#AI #featured

Anshuman Jain

With over 4 year of experience under the belt, I cover all facets of consumer tech, from smartphones to other consumer electronics, our favorite social media apps, as well as the growing realm of AI and LLMs. As an Apps and AI writer app Beebom, I provide my expertise in all these areas, weaving stories that help you get familiar with the tech around you. But you will find me playing NYT daily puzzles in my free time.

Comments 0

‘Name, Shame, and Hard Block Them’: Cloudflare Blasts Perplexity Over AI Website Scraping

How Perplexity Pulled Off the Grand Theft Data

Not Perplexity’s First Rodeo