My research focuses on building systems to facilitate a data-driven approach to security. The primary goal of my program is to apply this methodology to rigorously analyze the behavior of online attacks and facilitate developing platforms to discover and mitigate these attacks in a scalable and reliable manner. The problems that I tackle often involve the intersection of society, technology, and security. My research seeks to create solutions to evaluate the security and privacy implications of emerging technologies, identify associated threats, and improve the agility of defenders in responding to those threats in a timely fashion. My work has helped to develop techniques to protect users from important security problems, including ransomware and online scams, and guide the design of new defense systems.
My work seeks to facilitate a data-driven approach to enhance our understanding of the behavior of software systems as well as to build techniques to tackle their immediate, significant socio-technical problems.
The Web has become an essential part of our daily activity. One of the defining elements of the Web is the ability to link third-party web content. Using third-party content can be viewed as an assertion of trust that the content is benign. This assertion can be violated in several ways, however, due to the dynamic nature of the Web. A common theme of today’s online attacks, which include web-based scams or malicious code distribution, is that adversaries exploit the dynamicity of the Web ecosystem and perform operations that tend to be almost indistinguishable from legitimate behavior. A major thrust of my recent work focuses on improving the security and privacy of online users and restoring their confidence in transactions over the internet. Four recent projects stand out in this thrust as significant examples of my research style.
As part of my efforts to build web security defense systems, I developed an automated system to detect in-browser cryptojacking and scientifically explore the underlying ecosystem. In the paper entitled “Outguard: Detecting In-Browser Covert Cryptocurrency Mining in the Wild” which appeared in the Web Conference 2019 (WWW’19), we deployed Outguard in the wild, and identified 35 cryptojacking campaigns, 6,328 cryptojacking websites, and 24 previously unseen free or low cost mining services that operators used to schedule their mining tasks. Compared to prior work, the detection features of Outguard makes it a more generic solution to the problem. The approach relies on the core components of in-browser cryptojacking such as parallel processing, and does not incorporate static or threshold-based metrics such as the type of cryptographic function or CPU utilization features. For our efforts, the paper was awarded as the best paper among 225 accepted papers and 1,247 submitted papers.
Targeted marketing surveys bring in more than $21 billion in annual revenue by providing insights into what customers are thinking in a specific business sector. Adversaries have also discovered online surveys as a profitable attack vector by recruiting unsuspecting users and tricking them into releasing sensitive information about themselves or their companies. In the paper entitled “Surveylance: Automatically Detecting Online Survey Scams” which appeared in IEEE Security & Privacy 2018 (S&P ’18), I investigated the problem of survey scams in the wild by developing a tool to uncover the scale, underlying structure,attackers’ modus operandi, and contributing parties involved in these attacks. By incorporating Surveylance, I identified 8,623 websites, called survey gateways, that directed victims to more than 318K online survey scam pages. 127K (40%) of these pages were easily reachable from the Alexa top 30K websites. These websites exposed users to a wide range of security issues, including identity fraud, deceptive advertisements, potentially unwanted programs (PUPs), scareware, and malware. The paper provided several examples showing that the threat is serious, under-explored, and that normal web users are highly susceptible to these attacks.
Outguard SurveylanceJavaScript has enabled many of the modern functionalities in web browsers. However, the weakly-typed and dynamic nature of this language, as well as the relatively loose security guarantees of modern browsers for JavaScript code execution, often lead to different unintended side effects. In the paper entitled “Include Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions” which appeared in Financial Cryptography 2016, we proposed Excision to reduce the exposure of online users to malicious third-party content inclusions. To achieve this goal, we created an in-browser framework that provided a high fidelity view of the third-party inclusion process as well as the ability to interdict content loading in real-time. This approach allowed us to uncover the nature, structure, and distribution infrastructure of remote code loading, including malicious and previously unseen code. Studying web browser internals to conduct this research also informed our fundamental understanding of JavaScript parsing and rendering mechanisms, and led to a follow on paper entitled “Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance” appeared in the proceedings of RAID 2016. We proposed a light-weight provenance tracking technique in the browser to monitor content modification by extensions during content loading time. The ability to provide a granular view of dynamic inclusions has also inspired other researchers in the area of web privacy to analyze web tracking and advertisement delivery.
OriginTracer ExcisionA large body of my research revolves around developing applied tools and techniques to facilitate identifying the behavior of malicious binaries or creating tools to reduce the exposure of end-users to malware attacks.
Automated malware analysis systems (or sandboxes) are one of the most sophisticated tools in the malware research arsenal. These systems execute unknown binaries in an instrumented environment and monitor their execution. An important question is how to build the analysis environment so that the tool reveals the actual behavior of malicious code while resisting evasive attacks that try to fingerprint the analysis environment. As part of an NSF project, I developed an analysis environment, called Unveil, that generates semantically rich traces of malware activity by instrumenting different areas of Windows kernel, making the approach resistant to common anti-analysis fingerprinting techniques. The tool also incorporated a multi-class machine learning model to automatically label different classes of malware families, including 26 different ransomware families. Unveil quickly gained popularity among security professionals and researchers, and had been used as a malware sample repository by security researchers. Unveil analyzed over two million samples for 27 months and created a dataset of more than 280,000 ransomware samples from 26 different families and 132,000 trojans. We shared over 10 TB of data, which was used in 20+ academic papers in the malware research domain. The paper entitled “UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware” appeared in the proceedings of USENIX 2016.
UnveilAdversaries actively develop attacks to target end-point devices with the hope of gaining access to more critical systems. My primary contribution to end-point security practice, beyond analyzing online attacks on end-users, was to design end-point solutions to reduce the exposure of users to online attacks without introducing discernable performance impacts. Developing practical end-point solutions requires addressing a set of unique challenges. For instance, satisfying usability and performance guarantees are as crucial as security guarantees. Two recent end-point security projects stand out as significant examples of developing end-point solutions. In the paper entitled “Redemption: Real-time Protection Against Ransomware at End-Hosts,” which appeared in the proceedings of RAID 2017 [5], I explored the possibility of protecting user data from previously unknown ransomware attacks. In addition to its direct impact, perhaps the most interesting lesson from the study was that we showed it is possible to minimally update the filesystem and protect against a large number of ransomware attacks in the wild with zero data loss. Redemption was tested on 29 contemporary ransomware families. The work inspired other researchers in the area, and 10+ subsequent research techniques were tested based on the generated dataset.
In another work entitled “USBeSafe: An End-Point Solution to Protect Against USB-Based Attacks”, which appeared in the proceedings of RAID 2019 [2], I developed a generic machine learning model to prevent BadUSB attacks. In BadUSB attacks, adversaries can easily hide their malicious code in the USB firmware, allowing the device to take covert actions on the host. In particular, a rogue USB device could register itself as both a storage device and a Human Interface Device (HID) such as a keyboard, enabling the ability to inject surreptitious keystrokes to carry out malice. USB is a mature technology and is widely deployed. Therefore, in addition to all the design guarantees that have to be satisfied, a practical solution should prevent any modification at the level of USB communication protocol or the way users interact with their devices. We sought to improve the security of USB devices while keeping the corresponding protection mechanism entirely in the background. By developing USBeSafe, we showed that a machine learning model similar to ours could explain benign data in a precise fashion without requiring any modification to the USB protocol or user experience. I empirically showed that USBeSafe could be incorporated as a light-weight operating system service and disable the offending port by deploying the tool on real-user machines for 20 days.
Redemption USBeSafe