Wednesday, July 15, 2009

SOUPS 2009 Tutorial Explores Challenges of Evaluating Usable Security and Privacy Technology


"Once you have real people using the security in this design, what is the performance that you can expect? What is the performance you can expect at the security level in terms of the choices that the users will make? This is where we have a problem ... We don't have a clear set of criteria to assess a particular performance against ... If you don't have a criteria for what is actually an acceptable level of performance, then you just don't know if it is good enough or not." Angela Sasse, University College London

SOUPS 2009 Tutorial Explores Challenges of Evaluating Usable Security and Privacy Technology

By Richard Power


As I drove across Google's sprawling Mountain View campus, the memory of a 2005 visit to Microsoft rose up in my mind. I had traveled there to participate in a CSO Council meeting. The Redmond campus is a city-state, of course, with its own police force and its own street. In 2005, Google was only ten years old. Fast forward another four years. Just this month, Google announced that it is going to challenge Microsoft on the OS front (See Now Google parks its tanks right outside Microsoft's gates, Guardian, 7-12-09).

This afternoon, sitting in a sun-drenched pavilion on Google's grounds during a lunch break, I looked up from my grilled salmon to notice a employee walking by, with her dog on a leash, then I saw another, and then I saw another. There were dogs everywhere. I asked if this happened to be a special "Bring Your Dog to Work Day," but was told, "No, we are allowed to bring our dogs to work everyday." Hmmm. Could this remarkable corporate culture innovation give Google an edge in the struggle ahead?

But, of course, I did not come here to handicap the coming clash of the titans; I came here to report to you on the fifth Symposium on Usable Privacy and Security (SOUPS), which Google is hosting and co-sponsoring along with CyLab. (Next year, SOUPS will be held in Redmond.)

SOUPS is an annual event organized by Carnegie Mellon CyLab's Usable Privacy & Security Lab (CUPS).

Several significant evolutionary trends have emerged in cyber security and privacy over the last decade, ranging from the somewhat ill-conceived search for Return on Investment (ROI) in cyber security deployments to the much more promising inquiry into the ways in which the sciences of economics and psychology might better inform cyber security development. The quest for "Usable Security and Privacy" is one of the most intriguing of these trends; and SOUPS provides an invaluable forum for the exploration of themes in this vital area of research.

The first day of SOUPS 2009 was built around an all-day tutorial on "Designing and Evaluating Usable Security and Privacy Technology" led by M. Angela Sasse, Professor of Human-Centred Technology in the Department of Computer Science at UCL, Clare-Marie Karat, Research Staff Member in the Policy Lifecycle Technologies department at the IBM TJ Watson Research Center, and CyLab researcher Roy Maxion, a faculty member in the Computer Science and Machine Learning Departments at Carnegie Mellon University.

Sasse spoke on "Evaluating for Usability & Security."

Karat delivered a "Case Study of Usable Privacy and Security Policy Research."

Maxton shared "Mechanics of Experiments, Forensics, and Security."

Here are some excerpts from my notes and transcription of Sasse's compelling talk:

Starting off by citing a "cumbersome" definition of "evaluation" as “an assessment of the conformity between a work system's performance and its desired performance.” (Whitefield et al., 1991); Sasse then explained, "What they really mean by 'work system' is if a user works together with a system, what is the performance that you are going to get out of the combination? That is what you are actually interested in. Once you have real people using the security in this design, what is the performance that you can expect? What is the performance you can expect at the security level in terms of the choices that the users will make? This is where we have a problem ... We don't have a clear set of criteria to assess a particular performance against ... If you don't have a criteria for what is actually an acceptable level of performance, then you just don't know if it is good enough or not."

In the course of outlining the essentials of a proper usability evaluation plan, Sasse went into some depth concerning evaluation goals, first emphasizing the difference between summative goals (e.g., "Mech 1 performs better than Mech 2" or "Mech 1 meets performance criteria X, Y, Z") and formative goals (e.g., "exploratory evaluation of feasibility and indicative performance, user feedback, pointers for improvement"), and then exploring the breakdown of an evaluation goal.

"When it comes to usability and security, we need to look at two things: not only how well does the user perform with the security mechanism (e.g., how long does it take the user to remember, read off and enter the one-time pin, how long does it take to figure out which finger to use, where to put, etc.) but also what is the primary task, or production task, within which this security task is performed. The kind of experiments we have seen so far is basically, 'Thank you for coming, try this security mechanism, and I will measure how well you do.' But if it is envisioned, for example, that they do this as part of their on-line banking session, or for governments purposes, to fill their taxes on-line, then we need to create a realistic approximation of what that whole procedure looks like. At what point, would a user normally in the real world approach the system with the goal of 'I'm filing my taxes, and goddamn I'm late, I've got about twenty hours or so to do it."

"There are some things you can do in the lab, but there are some things that you can never really reproduce in a lab. If people anticipate that they are going to have problems with a security mechanism, they are going to change altogether how they behave and how they do their work. You find that because people fear that they might fail to authenticate themselves to a service that they either completely re-organize how they do their work, which has an impact on their productivity, and an impact on the productivity of the organization overall, or they might find workarounds, for instance, they just find ways of leaving the system open in order to avoid entering their credentials time and time again. You are never going to see people do that in the lab experiment."

Along with evaluating "performance achieved on security tasks" as well as the "actual level of security achieved given user choices and behaviour," Sasse also stressed evaluating "at what cost" these were achieved -- "to the individual user, to the system owner and to society as a whole."

"I recognize some of the issues, and they are pretty obvious," Roy Maxton asked Sasse, "so my question is what do you think makes it not obvious to so many people?"

"The answer is that so far security has basically been treated as special," she responded. "A lot of organizations out there are not very good at assessing the ongoing cost of ownership of certain types of technology. And when it comes to security that problem is magnified, because the argument always made is 'Security is important, just think of what could happen if we didn't have it.' They tend to only look at the cost of purchasing it and putting it in place. How much time or productivity is it going to take out of a company is a question I have never seen anybody ask up front, until very recently. Or 'how much of our system administrator's time is it going to take ... it is generally not looked at and factored into the cost of operating. But I am sure this will change. These kinds of ideas have now gotten out there. Traditionally, the only argument for security was risk mitigation, it was very often not off-set by the cost of ownership and operation ..."

Sasse went on to articulate other key elements of the evaluation process:

Scope, both summative (e.g., "sample large enough for adequate statistical power; generally larger samples than for formative evaluations" and formative (e.g., "explanatory results" providing "reasons for user choices" and "reasons for failure');

Participants (e.g., "need to control for practice and interference effects")

Context of use (e.g., need to replicate demands of production task, equipment used, physical context and situational context)

Criteria, including user cost (e.g., physical and mental workload), owner cost (e.g., "needs to be proportionate to degree of security achieved"), user satisfaction (e.g., "user confidence in mechanism itself" and "their own ability to operate it correctly") and, of course, security.

Aye, but there's the rub. The internationally recognized framework, Common Criteria for Information Technology Security Evaluation (ISO/IEC 15408), does not include in usability.

In her conclusion, Sasse emphasized the need to develop a framework for evaluating usability and security to ensure comparable and generalisable results, suggesting it that should be "linkable to assessment via Common Criteria" and might incorporate the NIST taxonomy as template for procedure.

These notes reflect the richness of the discussion in this day-long tutorial led by Karat, Maxton and Sasse.

Stay tuned for more from SOUPS 2009 over the next two days.

Some Related Posts:

SOUPS 2009 Best Paper Award Goes to "Ubiquitous Systems and the Family: Thoughts about the Networked Home"

CyLab Seminar Series Notes: User-Controllable Security and Privacy -- Norman Sadeh asks, "Are Expectations Realistic?"

CyLab Research Update: Locaccino Enables the Watched to Watch the Watchers

CyLab Chronicles: Wombat, the Latest CyLab Success Story

CyLab Chronicles: Q&A w/ Norman Sadeh

CyLab Chronicles: Q&A w/ Lorrie Cranor

Culture of Security: CUPS Research Takes on Both Widespread Attack Method & Dangerous Meme (Available to Cylab Partners Only)

For further commentary on SOUPS 2009, go to the CUPS Blog.