Reddit to Perplexity: Get your filthy hands off our forums

TITLE: Reddit Escalates Legal War Against AI Data Scrapers in Landmark Copyright Case

Reddit Files Federal Lawsuit Against Perplexity AI and Data Providers

Reddit has intensified its legal campaign against unauthorized data scraping with a federal lawsuit targeting Perplexity AI and three data service providers. The complaint, filed in the Southern District of New York, alleges systematic copyright infringement and illegal circumvention of technological protections to harvest Reddit’s user-generated content.

Reddit Files Federal Lawsuit Against Perplexity AI and Data Providers
The Data Laundering Economy Exposed
Defendants Accused of Systematic Evasion
Legal Framework and Allegations
Broader Industry Context and Precedents
Industry Responses and Defense Positions
Implications for AI Development and Content Rights

The social media platform accuses Oxylabs UAB, AWM Proxy, and SerpApi of operating as “data dealers” that illegally bypassed both Reddit’s and Google’s security measures. According to the filing, these companies enabled Perplexity to access and utilize Reddit content without entering into proper licensing agreements.

The Data Laundering Economy Exposed

Reddit’s Chief Legal Officer Ben Lee described an emerging “industrial scale data laundering economy” driven by AI companies‘ insatiable appetite for quality human-generated content. “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material,” Lee stated. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

The lawsuit portrays Reddit as facing sophisticated scraping operations that employ advanced evasion techniques. “Unable to scrape Reddit directly, they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search,” Lee explained., according to technology insights

Defendants Accused of Systematic Evasion

Reddit’s complaint details how the three data providers allegedly collaborated to circumvent protections. Oxylabs UAB, a Lithuania-based scraping service, AWM Proxy described as a “former Russian botnet,” and SerpApi, which offers access to scraped Google results, are characterized as textbook examples of illegal data harvesting operations.

The legal filing employs vivid analogies, comparing the defendants to “would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.” It further echoes Cloudflare CEO Matthew Prince’s characterization of Perplexity as operating like a “North Korean hacker” in its approach to data acquisition., as related article

Legal Framework and Allegations

Reddit contends the defendants violated multiple legal provisions, including:, according to recent research

Digital Millennium Copyright Act violations for circumventing technological protections
Trafficking in circumvention technology specifically against SerpApi and Oxylabs
Unfair competition and unjust enrichment claims
Civil conspiracy allegations against all parties

The company is seeking both injunctive relief to stop the scraping activities and monetary damages for the unauthorized use of its content.

Broader Industry Context and Precedents

This lawsuit represents the latest escalation in the ongoing battle between content creators and AI companies over training data. Reddit previously filed similar claims against Anthropic after failing to reach a licensing agreement, contrasting with OpenAI’s decision to license Reddit content.

The case joins several other high-profile legal actions concerning AI training data:

The recent lawsuit against Apple alleging use of pirated books in training datasets
Millette v. OpenAI concerning YouTube video scraping
The New York Times Co. v. Microsoft Corp., OpenAI regarding news content usage

Industry Responses and Defense Positions

Perplexity responded to the allegations before receiving the formal complaint, stating: “We will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

Neither Oxylabs, which describes itself as “the largest ethical proxy network,” nor SerpApi responded to requests for comment. Google, while not participating in the lawsuit, has implemented measures to prevent automated scraping of its search results.

Implications for AI Development and Content Rights

This case highlights the fundamental tension between AI companies’ need for training data and content platforms’ rights to control and monetize their users’ contributions. The outcome could establish important precedents for how publicly accessible web content can be used in AI training and what constitutes fair compensation for content creators.

As AI companies continue to seek high-quality training data, the industry faces increasing pressure to develop sustainable licensing models that respect copyright while enabling AI advancement. The resolution of these legal battles will likely shape the future landscape of AI development and content ownership for years to come.