It didn’t take a rocket scientist to predict that Verizon would mismanage many of the properties it acquired in the Yahoo deal. After the Tumblr debacle, Verizon has moved on to Yahoo Groups. The company plans to purge all Groups in a few days, and it just banned numerous accounts that were trying to archive the data before it goes away forever.
You probably don’t use Yahoo Groups — no one does, and that’s why Verizon is killing it. That part is neither surprising nor inappropriate. Verizon correctly points out that it doesn’t make financial sense to maintain millions of groups on a service that gets almost no usage. Volunteers from Archiveteam.org mobilized to save the Yahoo Groups data over the last few weeks. Their plan was to upload the salvaged data to the Internet Archive, but Verizon now stands in the way. The telecom giant has banned many of the archival accounts and blocked the tools the team used.
Yahoo Groups ia a massive repository of information, containing text replies, photos, folders, polls, calendars, and more. There are more than 5.6 million groups, only 2.7 million of which have been indexed by the archival project. About half of those have public messages — about 2.1 billion of them in total. The team had been making good progress with automated tools that allowed them to subscribe to groups with Yahoo email accounts and get the data. Now, the ban has caused the team to lose 80 percent of its data.
Verizon claims the automated scraping is a violation of its terms of service, and as such, it won’t be lifting the ban. To be clear, Verizon is not protecting some precious resource or vital data. It’s strictly enforcing its terms of service so it can nuke all this data in a few days. Admittedly, most of the data in Yahoo Groups isn’t vitally important, but it’s so easy to create and lose data in the digital age. Saving it for future generations should always be an option.
This leaves The Archive Team in a tough spot. Verizon will delete all Yahoo Groups data on December 14th. Even if they created new accounts and started over, their automated tool (PGOlffine) has been blocked at the server level. It’s impossible to manually join all these groups to download the data in the time remaining. It looks like Verizon has successfully blotted out this part of internet history.
- Verizon Sells Tumblr to WordPress Owner for Paltry $3 Million
- Flickr to Slash Free Storage From 1TB to 1,000 Photos
- Yahoo, AOL Email Now Scanned for Targeted Advertising