Metadata: Reidentification Using Telephone Data is Easier Than You Think
Following the September 11 terrorist attacks, the US government passed the USA Patriot Act in October 2001, in what members of Congress and the Bush administration said was an effort to strengthen national security. However, opponents criticized the law for including provisions that encroached on citizens’ privacy and civil liberties. One of the most controversial elements was the business records provision, which enabled the National Security Agency (NSA) to obtain a large portion of domestic telephone metadata.
As with most communications privacy protection laws, the Patriot Act distinguishes between electronic “content” and “metadata”—the former being the actual substance of the communication, and the latter being all other information about the communication: phone numbers, time and duration, and SIM card information. There is often a generous level of legal protection against the disclosure of content; however, disclosure of metadata is “often left to the near-total discretion of authorities.” Advocates of metadata surveillance argue that it does not contain personally identifiable information (PII), and therefore is not a significant privacy concern. However, a 2016 report published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS) shows that, despite the lack of PII in pure telephone metadata, “reidentification” of personal and sensitive information is possible with simple, and widely available, internet techniques.
Mayer et al. used an Android smartphone application, which participants could voluntarily download, to gather the metadata for this study. Once installed, the application automatically collected historical call and text messaging information, as well as information from participants’ Facebook accounts. Ethical rigor was maintained with various methods of participation consent, including disclosure notices, opportunities to review metadata, and multiple study exit points. The researchers also took precautions to ensure that participant information was always kept secure.
First, the researchers estimated the effective reach of surveillance programs under several “hop” and retention constraints, which often dictate the legal boundaries of such programs. Starting with a suspected telephone number (known as the “seed”), the hop constraint defines surveillance for those connected to the seed in a network map—a two-hop constraint limits metadata retrieval for those up to two connections, or hops, away from the seed. The retention constraint defines how much historical information can be accessed. The NSA’s surveillance program has a two-hop and five-year retention constraint, although a proposed revision would reduce the latter to 18 months. Based on these, the study suggests that, until 2013, the program gave “analysts legal authority to access telephone records for the majority of the entire US population.” Even after the retention reduction, and removal of national hubs as connectivity points, they still anticipate access to records of approximately 25,000 subscribers from a single seed.
In addition to the broad reach of metadata surveillance, the study also demonstrates the ease of inferring or identifying personal information. Telephone numbers were “trivially reidentifiable” by either automated or manual search techniques. A combination of free automated searches on interfaces hosted by Yelp, Google Places, and Facebook, and manual reidentification using Google search and the commercial database Intelius matched the identity of 82 percent of phone numbers. Despite inexpensive methods and “limited resources – far below those available to a large business or intelligence agency,” researchers were able to identify “the overwhelming majority of the numbers.”
Personal or sensitive information were also predictable with limited metadata. Although metadata surveillance policies usually separate call and text records from location records, researchers were able to correctly infer the current city of 57 percent of study participants against what they listed on their Facebook profile. This was done first by reidentifying businesses in a participant’s phone records, and then, under the assumption that most businesses called by individuals are close to their residence, finding the median location of the largest cluster of calls. Relationship status and romantic partners were identified with accuracies consistently greater than 50 percent. Lastly, and perhaps of most concern, was the researchers’ ability to draw conclusions of particular sensitivity about the participants’ lives. Telephone logs of calls to “sensitive organizations” such as healthcare providers, religious facilities, financial services, and legal services provided enough background for researchers to deduce highly sensitive information about subjects.
The results in this report demonstrate that overt PII is not necessary in order to obtain personal knowledge of individuals. While some policymakers may contend that access to metadata is benign, in reality, “telephone metadata is densely interconnected, easily reindentifiable, and trivially gives rise to location, relationship, and sensitive inferences.” In order to protect civil liberties, future policies regarding electronic surveillance and even commercial data collection programs must be vetted for these privacy loopholes. More direct, honest, and transparent regulatory guidelines benefits both national security and an individual’s right to privacy.
Article source: Mayer, Jonathan, Patrick Mutchler, and John Mitchell. “Evaluating the Privacy Properties of Telephone Metadata.” Proceedings of the National Academy of Sciences of the United States of America, 2016.
Featured photo: cc/(stefanamer, photo ID: 65394685, from iStock by Getty Images)