Reddit Alleges Anthropic Scraped Data Over 100,000 Times

reddit banner with karma icons scattered around in front of a dark background
Image Courtesy: Beebom
In Short
  • Reddit has filed a motion against Anthropic, the parent company behind the popular Claude AI.
  • The lawsuit claims Anthropic scraped Reddit's data over 100,000 times since July 2024.
  • This contradicts the AI startup's public image of respecting boundaries and the law.

Anthropic’s Claude AI model has received much praise for maintaining a strong ethical foundation and not using other websites to train its model. But it seems like this might not be the whole truth. That’s because, Reddit has filed a motion against Anthropic for scraping its site’s data more than 100,000 times since July 2024.

Reddit filed the suit on Wednesday in the San Francisco Superior Court against the AI research company. The filing mentions “This case is about the two faces of Anthropic: the public face that attempts to ingratiate itself into the consumer’s consciousness with claims of righteousness and respect for boundaries and the law, and the private face that ignores any rules that interfere with its attempts to further line its pockets”.

Anthropic Claude 3.5 Sonnet Computer Use featured image
Image Courtesy: Shutterstock

This isn’t the first rodeo for Anthropic, as the company has faced other legal troubles in the past. Three authors filed a class-action lawsuit in California federal court against the company in August 2024. They claimed that the company has built a “multibillion-dollar business by stealing hundreds of thousands of copyrighted books.”

If we go back further, Universal Music Group sued Anthropic over copyright infringement of their song lyrics in a Tennessee federal court in October 2023. This pattern does not bode well for the AI startup that holds its morals in such high regard. We are yet to hear a word from Anthropic’s part.

That said, Anthropic’s spokesperson did mention that they have put Reddit in their block list, so their web crawler does not go through their content. Reddit is the biggest pool of human content. It’s a data gold mine for any company to train their AI model on. Reddit understands this value that they hold, which is why the platform offers licensing deals like any other content publisher.

#Tags
Comments 0
Leave a Reply

Loading comments...