Monday, July 6, 2009
There is an Elephant in the Room; & Everyone’s Social Security Numbers are Written on Its Hide
Acquisti and Gross tested their prediction method using names in the Death Master File of people who died between 1973 and 2003. They could identify in a single attempt the first five digits for 44 percent of deceased individuals who were born after 1988 and for 7 percent of those born between 1973 and 1988. They were able to identify all nine digits for 8.5 percent of those individuals born after 1988 in less than 1,000 attempts. Their accuracy was considerably higher for smaller states and recent years of birth: for instance, they needed 10 or fewer attempts to predict all nine digits for one out of 20 Social Security numbers issued in Delaware in 1996. Carnegie Mellon Researchers Find Social Security Numbers can be Predicted from Publicly Available Information, Press Release, 7-6-09
There is an Elephant in the Room; & Everyone’s Social Security Numbers are Written on Its Hide
By Richard Power
At a conference last year, toward the end of the day, an attendee asked me what I thought were the top five privacy stories of the year; as I said, it was the end of the day, and perhaps I should have been more positive in my response, but instead, I just blurted out, “there is no privacy.” I wasn’t just being gruff or cynical. In a very real way, “privacy” as it was once espoused is gone.
That doesn’t mean that there are no privacy regulations to uphold; nor does it mean that all those Chief Privacy Officers are simply wasting their time. But there are junctures in history when little or no meaningful progress is going to be made unless we engage in merciless honesty, and we are at just such a juncture in regard to the issue of privacy.
The push for merciless honesty just got a big boost from CyLab researcher Alessandro Acquisti and Heinz College post-doctoral researcher Ralph Gross. They have rocked the world with the revelation that there is an elephant in the living room, and everyone’s social security numbers (SSN) are legible on the grey surface of its massive body.
In a paper that will appear later this week in the online Early Edition of the Proceedings of the National Academy of Science, and will be presented later this month at BlackHat 2009 in Las Vegas, Acquisti and Goss have changed the game forever and for good, by demonstrating “that it is possible to predict, entirely from public data, narrow ranges of values wherein individual SSNs are likely to fall.”
“Unless mitigating strategies are implemented,” Acquisti and Goss warn, “the predictability of SSNs exposes them to risks of identify theft on mass scales.”
“Any third party with Internet access and some statistical knowledge can exploit such predictability in two steps: First, by analyzing publicly available records in the SSA Death Master File (DMF) to detect statistical patterns in the SSN assignment for individuals whose deaths have been reported to the SSA; thereafter, by interpolating an alive person's state and date of birth with the patterns detected across deceased individuals' SSNs, to predict a range of values likely to include his SSN. Birth data, in turn, can be inferred from several offline and online sources, including data brokers, voter registration lists, online white pages, or the profiles that millions of individuals publish on social networking sites.”
Over the weekend, as Acquisti prepared himself for the firestorm of news media attention, I asked him three questions that put this blockbuster story in context:
What should the man or woman in the street take away from the results of your research?
"Our results show that it is possible to combine pieces of personal information from different sources, each of them not particularly sensitive, and end up inferring something that is more sensitive than any original piece of information alone. This calls for heightened attention by each of us towards what we reveal about ourselves online. However, our results also indicate that the problem of SSNs security goes much beyond consumers' responsibility and control: it has to do with the use (and abuse) of SSNs in the private sector for purposes (such as authentication) they were never designed to fulfill. As consumers, we have very little control on that. At the end of the day, this is a systematic problem that industry, policy-makers, and of course researchers must resolve."
What should privacy professionals in government and business take away from the results of your research?
“The broader message is that how 'sensitive' certain pieces of information are depends on what other data those pieces of information can be combined with. The more specific message, as it relates to SSNs, is that - in their current form - SSNs are very insecure passwords, and should not be used for authentication in any service.”
Are there any solutions that present themselves?
“Randomizing the assignment of future SSNs can buy us some time. But any system that still relies on using the same number (your SSN) as both an identifier and for authentication is a vulnerable system - because it is incongruously predicated on the sensitivity of a number that is shared with too many parties. For long run solutions, the attention should move towards the research on secure, efficient, and privacy-preserving means of authenticating identities.”
SSA Responds
In a statement issued in response to the research of Acquisti and Goss, the Social Security Administration sought to provide some context of its own:
“The public should not be alarmed by this report because there is no foolproof method for predicting a person's Social Security Number. The method by which Social Security assigns numbers has been a matter of public record for years. The suggestion that Mr. Acquisiti has cracked a code for predicting an SSN is a dramatic exaggeration.
For decades, we have cautioned the private sector, including educational, financial and health care institutions, against using the SSN as a personal identifier.
For reasons unrelated to this report, the agency has been developing a system to randomly assign SSNs. This system will be in place next year.
Stealing someone's SSN is a crime.”
Asked to comment on the statement, Acquisti stressed that his study “made no claim about breaking secret code,” but rather “demonstrates that publicly available information about SSNs and their assignment scheme is sufficient to infer regular patterns linked to demographics data, and predict very narrow ranges of values wherein individual SSNs are likely to fall.”
He commended the Social Security Administration for its effort, but urged realism in regard to the facts on the ground: “We want to praise the SSA for cautioning the private sector against using the SSN as a personal identifier. Unfortunately, such cautions have not been sufficient to dissuade many third parties: SSNs are still used (and abused) everywhere to authenticate identities, leads to widespread crimes of identity theft.”
Acquisti also agreed with the SSA’s “short-time” strategy, while again urging realism about the issues we are all confronting: “We also agree with the SSA about the need, in the short term, to randomize the assignment scheme (a proposal we advance in the manuscript). However, changing the assignment scheme may protect newly assigned SSNs, but not the hundreds of millions of already assigned SSNs. It may also makes us complacent to preserve the current -- and insecure -- system where SSNs are incongruously used by private sector entities both as public identifiers and private passwords - a role that SSNs never meant to fulfill when they were designed in the 1930s.”
Emphasizing the vital need for all of us to finally deal with the pervasive issue of weak authentication, Acquisti remarked: “By showing that SSNs are predictable from public data and therefore are vulnerable passwords, we hope to focus the debate toward more secure, efficient, and privacy-preserving means of authenticating identities in our society.”
CyLab corporate partners can access the full text of this post in the Culture of Security section of the partners-only portal. The full text includes commentary from privacy expert, Rebecca Herold, a participant in the CyLab Business Risks Forum.
Additional information about the study and some of the issues it raises is available at http://www.ssnstudy.org, as well as http://blogs.heinz.cmu.edu/ssnstudy/ (blog) and http://www.heinz.cmu.edu/~acquisti/ssnstudy/ (FAQ).
The full press release is available here.