Cloudflare is accusing AI startup Perplexity of utilizing stealth crawlers to bypass web site restrictions and entry content material in violation of web crawling norms.
In a current weblog publish, Cloudflare alleges that Perplexity deployed undeclared bots designed to slide previous conventional defenses and keep away from detection. The corporate says the exercise spans tens of millions of automated requests and has triggered up to date countermeasures.
Unauthorized entry to personal domains
Based on Cloudflare, it started investigating Perplexity after receiving experiences from web site operators who had blocked the corporate’s official crawlers however continued seeing their content material seem in Perplexity’s outcomes. To check the claims, Cloudflare created newly registered, undiscoverable domains and configured them to disclaim entry to all bots.
Regardless of these protections, Cloudflare says Perplexity was nonetheless capable of retrieve and floor particular content material from the restricted take a look at websites. The corporate alleges that Perplexity bypassed each robots.txt directives and net software firewall (WAF) guidelines in doing so.
Bots disguised as browsers
Cloudflare says the content material was accessed utilizing undisclosed bots that didn’t determine themselves as belonging to Perplexity. These crawlers reportedly posed as unusual browsers by mimicking widespread consumer brokers comparable to Chrome on macOS.
The site visitors additionally got here from IP addresses outdoors of Perplexity’s documented vary. Cloudflare states the bots rotated by means of totally different IPs and even modified Autonomous System Numbers (ASNs) to keep away from detection and blocking.
Cloudflare attributes tens of millions of those stealth requests to Perplexity every day, unfold throughout tens of hundreds of domains. The corporate claims it was capable of fingerprint the exercise utilizing community alerts and machine studying.
Perplexity’s net crawlers
Perplexity states that it makes use of two bots: one for search indexing and one other to fetch content material in response to consumer questions. Each function underneath declared consumer brokers, respect printed IP ranges, and will not be used to coach AI fashions.
These crawlers are documented on Perplexity’s web site, however Cloudflare’s allegations heart on site visitors coming from undeclared sources, outdoors the scope of what the corporate publicly describes.
Issues about how Perplexity accesses content material aren’t new. In 2024, a number of experiences claimed the corporate was scraping web sites that had blocked bots, counting on unlisted IPs and exterior crawling instruments. Amazon later confirmed it was reviewing whether or not this breached the AWS phrases of service.
Extra not too long ago, the BBC despatched a authorized letter accusing Perplexity of reproducing its content material with out permission and bypassing robots.txt restrictions it had positioned on the corporate’s declared bots.
Only a gross sales pitch?
Perplexity disputed Cloudflare’s allegations in an e-mail to TechCrunch. Spokesperson Jesse Dwyer referred to as Cloudflare’s weblog publish a “gross sales pitch” and stated the screenshots cited confirmed no content material was accessed. He added that the bot named within the report shouldn’t be operated by the corporate.
In different cybersecurity information, AI brokers are creating insider safety menace blind spots.