Tech giant Microsoft has reportedly pulled a massive facial recognition database containing more than 10 million images of roughly 100,000 people from the Internet, however traces of the data trove remain online, the media reported.
Microsoft released MS-Celeb-1M, a dataset of roughly 10 million photos from 100,000 individuals collected from the Internet in 2016.
The database was designed to contain photos of celebrities, but as Berlin-based researcher Adam Harvey pointed out with his project Megapixels, the definition of “celebrity” was quite broad, the Vice reported on Friday.
The database reportedly contained photos of “journalists, artists, musicians, activists, policy makers, writers, and academics”.
Microsoft said that the database was taken down just because the research challenge is over. Even so, it’s doubtful that the MS-Celeb-1M database’s life is over as well.
According to the Vice report, while the msceleb.org website is no longer accessible, the dataset itself is still available on a number of GitHub repositories. In a post on Megapixels, Harvey wrote, “Despite the recent termination of the msceleb.org website, the dataset still exists in several repositories on GitHub, the hard drives of countless researchers, and will likely continue to be used in research projects around the world.”
He further added that “it’s fairly clear that Microsoft has lost control of their MS Celeb dataset and biometric data of nearly 100,000 individuals.”
Several of the people included in the dataset were not asked for their consent to be included, but their images were scraped from the Internet under the Creative Commons license, the Vice report added.