On the heels of December’s New York Times investigation of a set of 50 billion points harvested from smartphones comes another eye-opener on the state of personal data collection: facial recognition powered by scraping social media and other websites, packaged and sold to law enforcement. The company at the heart of this new article by privacy journalist Kashmir Hill is Clearview AI. Founded by Hoan Ton-That, Clearview AI purports to have a database of 3 billion photos scraped from YouTube, Venmo, Facebook, and others, offering additional information such as names and locations from the source captions of the photos. The technology is currently marketed to law enforcement, and purchasers of the service include the Indiana State Police and the Gainesville Police. Hill notes that the Indiana State Police have used the application to identify a suspect in a shooting case.
Facial recognition is often recognized a privacy issue, but not always as a geoprivacy issue. However, there are clear geoprivacy implications associated with having the timestamp and location of a camera capture linked to a personal identity. Not only can X and Y coordinates be stored in the EXIF information of a photo as a geotag, but the content of the photo and its captions can provide a trail of a person’s locational history. Among the contextual location data one might glean from such a trove of stored identified photos are events you’ve attended, places you’ve lived in, and vacations you’ve taken, giving a certain view of your general interests and socioeconomic status. In much the same way that facial recognition algorithms behave, Google has an active patent for inferring the location of an image by comparing it to vast repositories of place images (think Google StreetView). Combining both would allow a system where you can know who and where someone is just from the photo.
Another issue at play is that the images in the Clearview AI database have been scraped from internet sources, in some cases violating the terms of service of companies hosting the data. The California Consumer Protection Act (CCPA) allows residents of California, like me, to ask companies to delete their personal data and prevent its sale. However, this does not stop companies from buying or scraping data which are already out there, transferred instantly to multiple other parties upon collection. The onus still rests under CCPA for individuals to contact hundreds of companies to manage their own data. We are also able to remove ourselves from the Clearview AI database, although the implementation should raise a collective worried brow. To remove your photos, you’ll need to send a headshot and photo of your government ID. There is certainly a risk that the company would retain such data.
Hill notes that Clearview AI takes all incoming photos from law enforcement and stores them on their servers. This releases sensitive information about ongoing investigations to a private company which has already shown disregard for terms of service. There are also questions, as with much facial recognition technology, about accuracy. It is reported the company finds photo matches 75% of the time. However, matches will still be found if the subject is wearing glasses, a hat, or only shows a partial view of the face. This raises the possibility of false identification, particularly serious if it leads to a false accusation of a crime. Clare Garvie, who is a researcher at Georgetown University’s Center on Privacy and Technology, notes in the New York Times piece that the larger the database, the larger the risk of misidentification.
The backlash in this case has been swift. Twitter has sent a cease-and-desist letter to Clearview for violation of its policies, and New Jersey police are now barred from using the application. In a separate but related case this week, Facebook paid a $550 million settlement in Illinois for violating the state’s biometric privacy law with its facial recognition photo tagging. The law requires companies to obtain written consent in order to collect a person’s facial scans, among other biological data. Investigations such as the Clearview AI article appear to be working to highlight insidious data practices and drum up support for more comprehensive privacy regulation.
Dr. Dara E. Seidl is an independent researcher and GIS professional who writes about geoprivacy. In response to a need for interactive geospatial ethics training, Dara will use her fellowship to build an entertaining resource for teaching ethical collection, use, and access to location data.