I describe my research as empirical system security and privacy.
An overarching theme of my research has been to enable evaluating implications of emerging technologies and their associated threats.
I apply data-driven techniques to rigorously analyze online threats and build research tools to discover them in a scalable and reliable manner.
My research requires knowledge of fundamental areas in computer science such as software engineering, machine learning, computer networks, and operating systems. This has allowed me to work on multiple topics such as Web and browser security, code analysis, and malware detection.
I am always looking for new talented members to join SecLab. If you are a graduate student with strong computer science background or undergrad who wants to learn more about systems security, I would like to chat with you!
Growing numbers of advanced malware-based attacks against governments and corporations, for political, financial and scientific gains, have taken security breaches to the next level. In response to such attacks, both academia and industry have investigated techniques to model and reconstruct these attacks and to defend against them. While such efforts have been all useful in mitigating the effects of modern attacks, automated malware code reuse inspection and campaign attribution have received less attention. In this paper, we present an automated system, called SCRUTINIZER, to identify code reuse in malware via a novel machine learning-based encoding mechanism at the function-level. By creating a large knowledge base of previously observed and tagged malware campaigns, we can compare unknown samples against this knowledge base and determine how much overlap exists. SCRUTINIZER leverages an unsupervised learning approach to filter out irrelevant functions before code reuse detection. It provides two valuable capabilities. First, it identifies ties between an unknown sample and those malware specimens that are known to be used by a specific campaign. Second, it inspects if specific tools or functionalities are used by a campaign. Using SCRUTINIZER, we were able to identify 12 samples that were previously unknown to us and that we were able to correctly assign to well-known APT campaigns.
The password recovery process is a critical part of a website’s functionality. Many websites that provide online services to their users also need to solve the problem of allowing their users to reset their passwords (e.g., if they have forgotten it). A popular, established technique for allowing a user to recover a lost account is to allow her to send a reset link to her own account via email. Although it might seem easy at a first glance, the security requirements of the password recovery process require web sites to carefully design each step of the process to be resilient even in the presence of an attack. In this paper, we present an in-depth security analysis of the email-based recovery mechanisms of a wide range of web applications. By manually registering accounts and triggering the password recovery process for each website, we were able to study the password reset mechanisms of web sites from three different groups in the Alexa Top 5K (i.e., popular sites, medium popular sites, low popular sites). In this work, we show that the lack of standards in the password recovery process plagues many websites with security weaknesses, and negatively influences the security of the reset process itself. We also show that concrete password-recovery reset attacks can be launched against a high percentage of websites that might even lead to account takeover.
Developers are increasingly deploying web applications which require real-time bidirectional updates, a use case which does not naturally align with the traditional client-server architecture of the web. Many solutions have arisen to address this need over the preceding decades, including HTTP polling, Server-Sent Events, and WebSockets. This paper investigates this ecosystem and reports on the prevalence, benefits, and drawbacks of these technologies, with a particular focus on the adoption of WebSockets. We crawl the Tranco Top 1 Million websites to build a dataset for studying real-time updates in the wild. We find that HTTP Polling remains significantly more common than WebSockets, and WebSocket adoption appears to have stagnated in the past two to three years. We investigate some of the possible reasons for this decrease in the rate of adoption, and we contrast the adoption process to that of other web technologies. Our findings further suggest that even when WebSockets are employed, the prescribed best practices for securing them are often disregarded. The dataset is made available in the hopes that it may help inform the development of future real-time solutions for the web.
In today’s web, it is not uncommon for web applications to take acomplete URL as input from users. Usually, once the web applicationreceives a URL, the server opens a connection to it. However, ifthe URL points to an internal service and the server still makes theconnection, the server becomes vulnerable to Server-Side Request Forgery (SSRF) attacks. These attacks can be highly destructivewhen they exploit internal services. They are equally destructiveand need much less effort to succeed if the server is hosted in a cloudenvironment. Therefore, with the growing use of cloud computing,the threat of SSRF attacks is becoming more serious.In this paper, we present a novel defense approach to protectinternal services from SSRF attacks. Our analysis of more than 60SSRF vulnerability reports shows that developers’ awareness aboutthis vulnerability is generally limited. Therefore, coders usuallyhave flaws in their defenses. Even when these defenses have noflaws, they are usually still affected by important security andfunctionality limitations. In this work, we develop a prototypebased on the proposed approach by extending the functionality ofa popular reverse proxy application and deploy a set of vulnerableweb applications with that prototype. We demonstrate how SSRF attacks on these applications, with almost no loss of performance, are prevented.
Targeted attacks via transient devices are not new. However, the introduction of BadUSB attacks has shifted the attack paradigm tremendously. Such attacks embed malicious code in device firmware and exploit the lack of access control in the USB protocol. In this paper, we propose USBESAFE as a mediator of the USB communication mechanism. By leveraging the insights from millions of USB packets, we propose techniques to generate a protection model that can identify covert USB attacks by distinguishing BadUSB devices as a set of novel observations. Our results show that USBESAFE works well in practice by achieving a true positive [TP] rate of 95.7% with 0.21% false positives [FP] with latency as low as three malicious USB packets on USB traffic. We tested USBESAFE by deploying the model at several end-points for 20 days and running multiple types of BadUSB-style attacks with different levels of sophistication. Our analysis shows that USBESAFE can detect a large number of mimicry attacks without introducing any significant changes to the standard USB protocol or the underlying systems. The performance evaluation also shows that USBESAFE is transparent to the operating system, and imposes no discernible performance overhead during the enumeration phase or USB communication compared to the unmodified Linux USB subsystem.
In-browser cryptojacking is a form of resource abuse that leverages end-users’ machines to mine cryptocurrency without obtaining the users’ consent. In this paper, we design, implement, and evaluate Outguard, an automated cryptojacking detection system. We con- struct a large ground-truth dataset, extract several features using an instrumented web browser, and ultimately select seven distinc- tive features that are used to build an SVM classification model. Outguard achieves a 97.9% TPR and 1.1% FPR and is reasonably tolerant to adversarial evasions. We utilized Outguard in the wild by deploying it across the Alexa Top 1M websites and found 6,302 cryptojacking sites, of which 3,600 are new detections that were ab- sent from the training data. These cryptojacking sites paint a broad picture of the cryptojacking ecosystem, with particular emphasis on the prevalence of cryptojacking websites and the shared infrastruc- ture that provides clues to the operators behind the cryptojacking phenomenon.
Online surveys are a popular mechanism for performing market research in exchange for monetary compensation. Unfortunately, fraudulent survey websites are similarly rising in popularity among cyber-criminals as a means for executing social engineering attacks. In addition to the sizable population of users that participate in online surveys as a secondary revenue stream, unsuspecting users who search the web for free content or access codes to commercial software can also be exposed to survey scams. This occurs through redirection to websites that ask the user to complete a survey in order to receive the promised content or a reward. In this paper, we present SURVEYLANCE, the first system that automatically identifies survey scams using machine learning techniques. Our evaluation demonstrates that SURVEYLANCE works well in practice by identifying 8,623 unique websites involved in online survey attacks. We show that SURVEYLANCE is suitable for assisting human analysts in survey scam detection at scale. Our work also provides the first systematic analysis of the survey scam ecosystem by investigating the capabilities of these services, mapping all the parties involved in the ecosystem, and quantifying the consequences to users that are exposed to these services. Our analysis reveals that a large number of survey scams are easily reachable through the Alexa top 30K websites, and expose users to a wide range of security issues including identity fraud, deceptive advertisements, potentially unwanted programs (PUPs), malicious extensions, and malware.
Ransomware is a form of extortion-based attack that locks the victim's digital resources and requests money to release them. The recent resurgence of high-profile ransomware attacks, particularly in critical sectors such as the health care industry, has highlighted the pressing need for effective defenses. While users are always advised to have a reliable backup strategy, the growing number of paying victims in recent years suggests that an endpoint defense that is able to stop and recover from ransomware's destructive behavior is needed. In this paper, we introduce Redemption, a novel defense that makes the operating system more resilient to ransomware attacks. Our approach requires minimal modification of the operating system to maintain a transparent buffer for all storage I/O. At the same time, our system monitors the I/O request patterns of applications on a per-process basis for signs of ransomware-like behavior. If I/O request patterns are observed that indicate possible ransomware activity, the offending processes can be terminated and the data restored. Our evaluation demonstrates that Redemption can ensure zero data loss against current ransomware families without detracting from the user experience or inducing alarm fatigue. In addition, we show that Redemption incurs modest overhead, averaging 2.6% for realistic workloads.
If you need ransomware dataset to do your research, please send me an email at kharraz[at]illinois[dot]edu using your organization email address.
Recent years have seen the rise of many classes of cyber attacks ranging from ransomware to Advanced Persistent Threats (APTs) which pose severe risks to companies and enterprises. While static detection and signature-based tools are still useful in detecting already observed threats, they lag behind in detecting such sophisticated attacks where adversaries are adaptable and can evade defenses. This book chapter intends to explain how to analyze the nature of current multi-dimensional attacks, and how to identify the root causes of such security incidents. The chapter also elaborates on how to incorporate the acquired intelligence to minimize the impact of complex threats, and perform rapid incident response.
Although the concept of ransomware is not new (i.e., such attacks date back at least as far as the 1980s), this type of malware has recently experienced a resurgence in popularity. In fact, in 2014 and 2015, a number of high-profile ransomware attacks were reported, such as the large-scale attack against Sony that prompted the company to delay the release of the film "The Interview". Ransomware typically operates by locking the desktop of the victim to render the system inaccessible to the user, or by encrypting, overwriting, or deleting the user's files. However, while many generic malware detection systems have been proposed, none of these systems have attempted to specifically address the ransomware detection problem. In this paper, we present a novel dynamic analysis system called UNVEIL that is specifically designed to detect ransomware. The key insight of the analysis is that in order to mount a successful attack, ransomware must tamper with a user's files or desktop. UNVEIL automatically generates an artificial user environment, and detects when ransomware interacts with user data. In parallel, the approach tracks changes to the system's desktop that indicate ransomware-like behavior. Our evaluation shows that UNVEIL significantly improves the state of the art, and is able to identify previously unknown evasive ransomware that was not detected by the anti-malware industry.
If you need ransomware dataset to do your research, please send me an email at kharraz[at]illinois[dot]edu using using your organization email address.
Extensions provide useful additional functionality for web browsers, but are also an increasingly popular vector for attacks. Due to the high degree of privilege extensions can hold, extensions have been abused to inject advertisements into web pages that divert revenue from content publishers and potentially expose users to malware. Users are often unaware of such practices, believing the modifications to the page originate from publishers. Additionally, automated identification of unwanted third-party modifications is fundamentally difficult, as users are the ultimate arbiters of whether content is undesired in the absence of outright malice. To resolve this dilemma, we present a fine-grained approach to tracking the provenance of web content at the level of individual DOM elements. In conjunction with visual indicators, provenance information can be used to reliably determine the source of content modifications, distinguishing publisher content from content that originates from third parties such as extensions. We describe a prototype implementation of the approach called OriginTracer for Chromium, and evaluate its effectiveness, usability, and performance overhead through a user study and automated experiments. The results demonstrate a statistically significant improvement in the ability of users to identify unwanted third-party content such as injected ads with modest performance overhead.
In this paper, we present the results of a long-term study of ransomware attacks that have been observed in the wild between 2006 and 2014. We also provide a holistic view on how ransomware attacks have evolved during this period by analyzing 1,359 samples that belong to 15 different ransomware families. Our results show that, despite a continuous improvement in the encryption, deletion, and communication techniques in the main ransomware families, the number of families with sophisticated destructive capabilities remains quite small. In fact, our analysis reveals that in a large number of samples, the malware simply locks the victim's computer desktop or attempts to encrypt or delete the victim's files using only superficial techniques. Our analysis also suggests that defending against ransomware attacks is not as complex as it has been previously reported. For example, we show that by monitoring abnormal file system activity, it is possible to design a practical defense system that could stop a large number of ransomware attacks, even those using sophisticated encryption capabilities. A close examination on the file system activities of multiple ransomware samples suggests that by looking at I/O requests and protecting Master File Table (MFT) in the NTFS file system, it is possible to detect and prevent a significant number of zero-day ransomware attacks.
QR codes, a form of 2D barcode, allow easy interaction between mobile devices and websites or printed material by removing the burden of manually typing a URL or contact information. QR codes are increasingly popular and are likely to be adopted by malware authors and cyber-criminals as well. In fact, while a link can look suspicious, malicious and benign QR codes cannot be distinguished by simply looking at them. However, despite public discussions about increasing use of QR codes for malicious purposes, the prevalence of malicious QR codes and the kinds of threats they pose are still unclear.
In this paper, we examine attacks on the Internet that rely on QR codes. Using a crawler, we performed a large-scale experiment by analyzing QR codes across 14 million unique web pages over a nine-month period. Our results show that QR code technology is already used by attackers, for example to distribute malware or to lead users to phishing sites. However, the relatively few malicious QR codes we found in our experiments suggest that, on a global scale, the frequency of these attacks is not alarmingly high and users are rarely exposed to the threats distributed via QR codes while surfing the web.
In this paper, we introduce an efficient route discovery mechanism to enhance the performance and multicast efficiency of On-Demand Multicast Routing Protocol (ODMRP). Our framework, called limited flooding ODMRP, improves multicasting mechanism by efficiently managing flooding mechanism based on delay characteristics of the contributing nodes. In our model, only the nodes that satisfy the delay requirements can flood the Join-Query messages. We model the contributing nodes as M/M/1 queuing systems. Our framework considers the significant parameters in delay analysis, including random packet arrival, service process, and random channel access in the relying nodes, and exhibits its best performance results under high traffic load. Simulation results reveal that limited flooding ODMRP drastically reduces the packet overhead under various simulation scenarios as compared to original ODMRP.
My current research interests are mainly in application and system security with special focus on malware analysis, file systems and operating system security.Read More
There are always exciting things to discuss in our field of research. My goal is to post interesting things in systems security on a regular basis.Read More