170
Recently, there has been a conflict between Cloudflare and Perplexity regarding alleged stealth data scraping. Cloudflare accused Perplexity of crawling websites stealthily, even when explicitly instructed not to do so. Perplexity, on the other hand, denies these allegations.
Cloudflare Alleges Perplexity Of Stealth Data Scraping
In a recent post, Cloudflare claimed that Perplexity was aggressively scraping data from websites in a stealthy manner. Cloudflare discovered this behavior when customers reported Perplexity crawlers accessing their sites against their wishes. Cloudflare conducted experiments to verify these claims and found that Perplexity was still able to gather information from restricted domains, despite efforts to block them.
We conducted an experiment by querying Perplexity AI with questions about these domains, and discovered Perplexity was still providing detailed information regarding the exact content hosted on each of these restricted domains. This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers.
Cloudflare found that Perplexity crawlers were bypassing standard rules, such as robots.txt files and allowlists, to access content on websites. While Perplexity claims that one of its crawlers, Perplexity-User, may ignore robots.txt rules under certain circumstances, Cloudflare observed these crawlers using deceptive tactics to access blocked content.
Cloudflare compared Perplexity’s practices with OpenAI’s ChatGPT and found that ChatGPT-User crawler adhered to best practices by respecting disallowed directives.
Perplexity Refutes Cloudflare’s Statements
Perplexity denied Cloudflare’s allegations, calling their blog post a “sales pitch.” Perplexity spokesperson Jesse Dwyer refuted any association with the bot mentioned in Cloudflare’s post. However, concerns about Perplexity’s data scraping practices persist, as evidenced by a lawsuit filed by a Japanese newspaper, Yomiuri Shimbun, against Perplexity for alleged copyright infringement.
Yomiuri Shimbun seeks damages of $14.7 million for the unauthorized use of 120,000 articles by Perplexity. The outcome of this lawsuit could have implications for how AI services access and utilize online information.
Share your thoughts in the comments section below.



