Hsu-Chun Hsiao delivers a paper on LAP: Lightweight Anonymity and Privacy
A Report on 2012 IEEE Symposium on Privacy and Security
As noted in previous CyBlog posts, IEEE's annual Symposium on Privacy and Security (a.k.a. "Oakland") is an important event in the realm of academic research on how to best strengthen cyber security and privacy. This year's Symposium lived up to expectations. (And I am not just saying that because Carnegie Mellon University CyLab's imprint was on eight different sessions. See CyLab Chronicles: CyLab's Strong Presence at IEEE Security and Privacy 2012 Packs A Wallop.)
Here are a few glimpses into some sessions that interested me.
Prudent Practices for Designing Malware Experiments
Christian Rossow of the Institute for Internet Security delivered a talk on "Prudent Practices for Designing Malware Experiments," a paper co-authored with Christian J. Dietrich and Norbert Pohlmann, also of Institute for Internet Security, along Chris Grier, Christian Kreibich and Vern Paxson of University of California, Berkeley and International Computer Science Institute, Berkeley, as well as Herbert Bos and Maarten van Steen, VU University Amsterdam, The Network Institute.
Rossow articulated numerous guidelines on safety, transparency, realism and correct data sets.
I have pulled out an example of one of the guidelines from each categories:
Safety: "1) Deploy and describe containment policies. Well-designed containment policies facilitate realistic experiments while mitigating the potential harm malware causes to others over time. Experiments should at a minimum employ basic containment policies such as redirecting spam and infection attempts, and identifying and suppressing DoS attacks. Authors should discuss the containment policies and their implications on the ﬁdelity of the experiments. Ideally, authors also monitor and discuss security breaches in their containment."
Transparency: "4) Mention the system used during execution. Malware may execute differently (if at all) across various systems, software conﬁgurations and versions. Explicit description of the particular system(s) used (e.g., 'Windows XP SP3 32bit without additional software installations') renders experiments more transparent, especially as presumptions about the 'standard' OS change with time. When relevant, authors should also include version information of installed software.
Realism: "5) Consider allowing Internet access to malware. Deferring legal and ethical considerations for a moment, we argue that experiments become signiﬁcantly more realistic if the malware has Internet access. Malware often requires connectivity to communicate with command-and-control (C&C) servers and thus to expose its malicious behavior. In exceptional cases where experiments in simulated Internet environments are appropriate, authors need to describe the resulting limitations.
Correct data sets: "2) Balance datasets over malware families. In unbalanced datasets, aggressively polymorphic malware families will often unduly dominate datasets ﬁltered by sample-uniqueness (e.g., MD5 hashes). Authors should discuss if such imbalances biased their experiments, and, if so, balance the datasets to the degree possible. explicitly if they decide to blend malicious traces with benign background activity."
Detecting Hoaxes, Frauds, and Deception in Writing Style Online
Sadia Afroz of Drexel University delivered a talk on "Detecting Hoaxes, Frauds, and Deception in Writing Style Online," a paper co-authored with colleagues Michael Brennan and Rachel Greenstadt.
This fascinating paper used the compelling story from recent headlines, i.e., strange tale of Amina, the "Gay Girl in Damascus," whose blog captured the attention of the world during the early days of the Arab Spring, only to be later revealed as the work of Thomas Macmaster, a 40 year old American male.
In reporting on the research, Afroz and her colleagues, concluded:
"Stylometry is necessary to determine authenticity of a document to prevent deception, hoaxes and frauds. In this work, we show that manual counter-measures against stylometry can be detected using second-order effects. That is, while it may be impossible to detect the author of a document whose authorship has been obfuscated, the obfuscation itself is detectable using a large feature set that is content-independent. Using Information Gain Ratio, we show that the most effective features for detecting deceptive writing are function words. We analyze a long-term deception and show that regular authorship recognition is more effective than deception detection to find indication of stylistic deception in this case."
As Afroz and her colleagues also point out, such research has implications for adversarial learning in general:
"Machine learning is often used in security problems from spam detection, to intrusion detection, to malware analysis. In these situations, the adversarial nature of the problem means that the adversary can often manipulate the classiﬁer to produce lower quality or sometimes entirely ineffective results. In the case of adversarial writing, we show that
using a broader feature set causes the manipulation itself to be detectable. This approach may be useful in other areas of adversarial learning to increase accuracy by screening out adversarial inputs."
Oakrams: Searching Through Strands of Oakland's DNA
The three day event culminated in a all-star panel on "How can a Focus on 'Science' Advance Research in Cyber Security?" Moderated by Carl Landwehr, the panel members, including Alessandro Acquisti (Carnegie Mellon), Dan Boneh (Stanford), Joshua Guttman (Worcester Polytechnic Institute), Wenke Lee (Georgia Tech) and Cormac Herley (Microsoft) on whether or not the realm of cyber security as currently constituted should be or is already "science." But honestly, in spite of some sparkling insights, particularly from Acquisti and Herley, this debate has a certain dog chasing its tail futility to it. It is the kind of debate that become central after it is already too late to grasp the reality of a situation. It reminded me of a sage perspective delivered back in the 1990s, by the legendary Donn B. Parker: "Information Security, A Folk Art in Need of An Upgrade." Parker was spot-on on that, as well as on other issues.
So before the theme music to the Bill Murray film Groundhog Day once again starts to rise up in my psyche, let me turn away from the august panel and its erudite dialogue, and end this report from Oakland on a "short talk" in which Hilarie Orman (Purple Streak, Inc.) shared her "Oakrams."
I suggest there is at least as much import in them as in the debate over "cyber security" as "science."
Orman was kind enough to explain her exercise to me.
"I call them 'Oakrams' (the conference used to be called "Oakland" informally, and the software is based on an open source system call 'WordCram.' I modified WordCram so that I could control the coloring based on the word position, and so that I could reuse a word placement while changing size and color. This resulted in two sequences of images. I preprocessed the text of the papers so that for each year I had an ordered list of all non-trivial words that occurred 20 times or more. In the first sequence, for each year of the conference, I arranged the words so that the size and color intensity was proportional to word's frequency for that year. I modified WordCram to get word arrangements that were both denser and more uniform that its usual algorithms could produce. The word coloring varied uniformly over a small color range from top to bottom and left to right. Each year had slightly different range, overlapping with the previous year, and drifting from yellow through green in 1980 to the final blue through reddish yellow in 2012.
The word arrays seemed endlessly interesting to me. Some words are loaded with context in the security world, and their presence or absence in an array was a source for reflection. As a small example, the word 'alice' appeared briefly in one or two years, but never rose to prominence.
These arrays showed that 'system,' 'information,' and 'security' were usually the most frequent words in each year. This wasn't surprising, but I wanted to get more information about the words that had varying popularity, and I wondered if the words could point out trends in topics. That led to the next phase.
The second sequence of images used only 50 words. These were the words that were the 'most popular' over the 33 years. For each year, each word had the same placement in the visual array, but the size and color varied. The size of a word was proportional to its frequency for that year. The color hue varied from red to blueish-purple, where red meant the word had not occurred in the previous 5 years, and the amount of blue represented its average frequency during the previous 5 years. As words moved in and out of popularity their size and color and opacity varied to reflect their usage.
It was interesting to see how long it took for networking terms like 'message,' 'packet,' and 'node' took to get traction. I was amazed that "privacy" has rarely been a major term, despite it being part of the 'Security and Privacy' symposium's name! And, to me, it was quite significant that 'application' and "attack" have become major terms --- we used to focus on provably secure operating systems, now we try to protect individual applications against specific attacks.
I'm a calligrapher and student of typography; the wordcrams are artistic objects that I enjoy, but they carry some fragments of meaning, like pieces of DNA."
-- Richard Power
CyLab Research has Powerful Impact on 2010 IEEE Security and Privacy Symposium
Microcosm & Macrocosm: Reflections on 2010 IEEE Symposium on Security and Privacy; Q and A on Cloud, Cyberwar and Internet Freedom with Dr. Peter Neumann
Five Papers Add to Impressive CyLab Presence at ACM CCS 2011
CyLab Research Presentations Impact CHI 2011
USENIX Security 2011: Another Ring on the Tree Trunk for One of Cyber Security's Worthiest Gatherings, and a Strong CyLab Presence
USENIX Security 2011: CyLab Researchers Release Study on Illicit Online Drug Trade and Attacks on Pharma Industry