Understanding what malicious software actually does — at scale, under adversarial conditions, and before significant damage is done — requires instrumenting systems at the right level of abstraction. We build analysis platforms that generate semantically rich behavioral traces, classify malware families automatically, and protect end-users without introducing perceptible overhead.
Instrument systems at the kernel level to capture fine-grained behavioral traces of malicious code across families, deployment scenarios, and evasion techniques.
Connect low-level execution traces to attacker goals, evasion strategies, campaign attribution, and the structural properties that distinguish malicious behavior from benign activity.
Rank families and techniques by destructive capability, prevalence, and the feasibility of practical defense — focusing effort on threats where intervention is both urgent and tractable.
Release open-source platforms, large labeled datasets, and deployed endpoint solutions that lower the cost of future research and give practitioners usable defenses.
Unveil analyzed over 2 million samples continuously for 27 months by instrumenting Windows kernel components. ShadowBox and Lase push this further with in-kernel, low-artifact engines that capture system-wide temporal data — processes, threads, I/O requests, DLL injections — while satisfying the anti-analysis checks that modern malware uses to detect sandboxes.
Justified by: Unveil (USENIX'16) • ShadowBox (eCrime'25) • Lase (2025) • Ransomware DIMVA (2014)The foundational ransomware study revealed that monitoring MFT and I/O requests exposes the destructive intent of even sophisticated encryptors. SCRUTINIZER maps code reuse across campaigns via ML-based function encoding, while Forged Signatures traces how certificate hijacking exploits differential trust decisions across browsers and operating systems.
Justified by: Ransomware DIMVA (2014) • Unveil (USENIX'16) • SCRUTINIZER (2021) • Forged Signatures (IEMCON'25)The DIMVA study showed that despite growing family counts, most ransomware in the wild used superficial file-system techniques — making kernel-level I/O monitoring a high-leverage defensive target. Unveil classified 26 families across 280K samples to establish ground-truth family distributions. Forged Signatures shows browsers and OS respond inconsistently to certificate abuse, pinpointing where the attack surface is most open.
Justified by: Ransomware DIMVA (2014) • Unveil (USENIX'16) • Forged Signatures (IEMCON'25)Unveil shared 10TB of labeled data used in 20+ follow-on papers. Redemption achieved zero data loss against 29 ransomware families with minimal filesystem changes. ShadowBox and Lase are open-source, portable engines with execution trace datasets released to lower community engineering costs. USBeSafe deploys as a background OS service with no USB protocol modifications.
Justified by: Unveil (USENIX'16) • Redemption (RAID'17) • USBeSafe (RAID'19) • ShadowBox (eCrime'25) • Lase (2025)The first systematic long-term study of ransomware in the wild, covering 1,359 samples across 15 families observed between 2006 and 2014. Despite the narrative of ever-increasing sophistication, the analysis found that the majority of families relied on superficial file-system operations — and that monitoring I/O requests and protecting the NTFS Master File Table (MFT) is sufficient to detect and stop a large fraction of zero-day ransomware attacks, including those using advanced encryption.
Unveil instruments Windows kernel components to generate semantically rich execution traces that resist common anti-analysis fingerprinting. Deployed continuously for 27 months, it analyzed over two million samples and produced a dataset of 280,000+ ransomware samples spanning 26 families and 132,000 trojans. The 10TB dataset was shared with the research community and used in 20+ follow-on papers, making Unveil one of the most widely used malware analysis datasets in the field.
Redemption explores whether minimal, targeted filesystem modifications can prevent previously unknown ransomware from causing data loss — achieving zero data loss against 29 contemporary families. The key insight is that a practical kernel-level I/O policy can generalize across diverse ransomware behaviors without requiring prior knowledge of specific samples. The generated dataset and technique inspired 10+ follow-on defenses from the research community.
BadUSB attacks hide malicious code in USB firmware, allowing devices to impersonate keyboards and inject keystrokes silently. USBeSafe trains a machine learning model on benign USB traffic patterns, then disables offending ports transparently — with no changes to the USB protocol or user experience. Validated over a 20-day deployment on real-user machines, the system demonstrates that a lightweight OS-level service can close a hardware-layer attack surface without any infrastructure changes.
ShadowBox is an open-source, portable analysis framework that provides system-wide monitoring capabilities while satisfying the contemporary anti-analysis checks that modern malicious code uses to detect sandboxes. The framework achieves a carefully designed balance between visibility and artifact minimization — a historically difficult trade-off. ShadowBox and its execution trace dataset are released to the research community to lower the engineering cost of threat analysis and support longitudinal behavioral catalogs across security domains.
Lase is an open-source, low-artifact forensics engine that operates in high-privileged kernel mode, making it nearly impossible for user-mode malware to fingerprint, tamper with, or kill the monitor. It captures system-wide temporal data — processes, threads, I/O requests, synchronous and asynchronous I/Os, fast I/Os — essential for recording the behavior of evasive attacks. Two deployment scenarios are demonstrated: bare-metal large-scale malware analysis on physical machines, and a distributed deception-based infrastructure for in-cloud threat reasoning.
SCRUTINIZER provides automated campaign attribution by identifying code reuse across malware samples at the function level. Using an unsupervised ML approach to filter irrelevant functions before comparison, it builds a knowledge base of tagged campaigns and identifies how much overlap unknown samples share with known actors. The system identified 12 previously unknown samples connected to known campaigns, demonstrating that function-level encoding generalizes across obfuscation and compiler variation.
Adversaries increasingly hijack legitimate code-signing certificates and attach them to malicious binaries to deceive browsers and operating systems into allowing execution. This study empirically evaluates how modern browsers respond to untrusted, signed malicious binaries, revealing that browser responses differ significantly from one another — and that the OS may respond ineffectively, leaving users vulnerable to a straightforward and low-cost adversarial tactic. The paper shows that a browser extension can significantly reduce the attack surface exposed by certificate abuse.
← Back to Research Full publication list