Malicious Code Analysis

Research Methodology

01 Observe
Instrument systems at the kernel level to capture fine-grained behavioral traces of malicious code across families, deployment scenarios, and evasion techniques.
02 Interpret
Connect low-level execution traces to attacker goals, evasion strategies, campaign attribution, and the structural properties that distinguish malicious behavior from benign activity.
03 Prioritize
Rank families and techniques by destructive capability, prevalence, and the feasibility of practical defense — focusing effort on threats where intervention is both urgent and tractable.
04 Support action
Release open-source platforms, large labeled datasets, and deployed endpoint solutions that lower the cost of future research and give practitioners usable defenses.

Applied to Malicious Code Analysis

Observe
Kernel-level instrumentation at longitudinal scale
Unveil analyzed over 2 million samples continuously for 27 months by instrumenting Windows kernel components. ShadowBox and Lase push this further with in-kernel, low-artifact engines that capture system-wide temporal data — processes, threads, I/O requests, DLL injections — while satisfying the anti-analysis checks that modern malware uses to detect sandboxes.
Justified by: Unveil (USENIX'16) • ShadowBox (eCrime'25) • Lase (2025) • Ransomware DIMVA (2014)
Interpret
Connecting execution traces to attacker goals and evasion strategies
The foundational ransomware study revealed that monitoring MFT and I/O requests exposes the destructive intent of even sophisticated encryptors. SCRUTINIZER maps code reuse across campaigns via ML-based function encoding, while Forged Signatures traces how certificate hijacking exploits differential trust decisions across browsers and operating systems.
Justified by: Ransomware DIMVA (2014) • Unveil (USENIX'16) • SCRUTINIZER (2021) • Forged Signatures (IEMCON'25)
Prioritize
Identifying which threats are tractable and where defenses matter most
The DIMVA study showed that despite growing family counts, most ransomware in the wild used superficial file-system techniques — making kernel-level I/O monitoring a high-leverage defensive target. Unveil classified 26 families across 280K samples to establish ground-truth family distributions. Forged Signatures shows browsers and OS respond inconsistently to certificate abuse, pinpointing where the attack surface is most open.
Justified by: Ransomware DIMVA (2014) • Unveil (USENIX'16) • Forged Signatures (IEMCON'25)
Support action
Open platforms, shared datasets, and zero-overhead endpoint defenses
Unveil shared 10TB of labeled data used in 20+ follow-on papers. Redemption achieved zero data loss against 29 ransomware families with minimal filesystem changes. ShadowBox and Lase are open-source, portable engines with execution trace datasets released to lower community engineering costs. USBeSafe deploys as a background OS service with no USB protocol modifications.
Justified by: Unveil (USENIX'16) • Redemption (RAID'17) • USBeSafe (RAID'19) • ShadowBox (eCrime'25) • Lase (2025)

Ransomware: From Observation to Defense

Observe Interpret Prioritize

Cutting the Gordian Knot: A Look Under the Hood of Ransomware Attacks

DIMVA 2014

The first systematic long-term study of ransomware in the wild, covering 1,359 samples across 15 families observed between 2006 and 2014. Despite the narrative of ever-increasing sophistication, the analysis found that the majority of families relied on superficial file-system operations — and that monitoring I/O requests and protecting the NTFS Master File Table (MFT) is sufficient to detect and stop a large fraction of zero-day ransomware attacks, including those using advanced encryption.

PDF

Observe Prioritize Support action

UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware

USENIX Security 2016

Unveil instruments Windows kernel components to generate semantically rich execution traces that resist common anti-analysis fingerprinting. Deployed continuously for 27 months, it analyzed over two million samples and produced a dataset of 280,000+ ransomware samples spanning 26 families and 132,000 trojans. The 10TB dataset was shared with the research community and used in 20+ follow-on papers, making Unveil one of the most widely used malware analysis datasets in the field.

PDF

Interpret Support action

Redemption: Real-time Protection Against Ransomware at End-Hosts

RAID 2017

Redemption explores whether minimal, targeted filesystem modifications can prevent previously unknown ransomware from causing data loss — achieving zero data loss against 29 contemporary families. The key insight is that a practical kernel-level I/O policy can generalize across diverse ransomware behaviors without requiring prior knowledge of specific samples. The generated dataset and technique inspired 10+ follow-on defenses from the research community.

PDF

Endpoint & Device Security

Interpret Support action

USBeSafe: An End-Point Solution to Protect Against USB-Based Attacks

RAID 2019

BadUSB attacks hide malicious code in USB firmware, allowing devices to impersonate keyboards and inject keystrokes silently. USBeSafe trains a machine learning model on benign USB traffic patterns, then disables offending ports transparently — with no changes to the USB protocol or user experience. Validated over a 20-day deployment on real-user machines, the system demonstrates that a lightweight OS-level service can close a hardware-layer attack surface without any infrastructure changes.

PDF

Malware Analysis Platforms & Forensics

Observe Support action

ShadowBox: A Low-Artifact Framework for Analyzing Evasive Malicious Code

APWG eCrime 2025 — presented in San Diego, September 2025

ShadowBox is an open-source, portable analysis framework that provides system-wide monitoring capabilities while satisfying the contemporary anti-analysis checks that modern malicious code uses to detect sandboxes. The framework achieves a carefully designed balance between visibility and artifact minimization — a historically difficult trade-off. ShadowBox and its execution trace dataset are released to the research community to lower the engineering cost of threat analysis and support longitudinal behavioral catalogs across security domains.

PDF

Observe Support action

Lase: An In-kernel Forensics Engine for Investigating Evasive Attacks

arXiv preprint, 2025

Lase is an open-source, low-artifact forensics engine that operates in high-privileged kernel mode, making it nearly impossible for user-mode malware to fingerprint, tamper with, or kill the monitor. It captures system-wide temporal data — processes, threads, I/O requests, synchronous and asynchronous I/Os, fast I/Os — essential for recording the behavior of evasive attacks. Two deployment scenarios are demonstrated: bare-metal large-scale malware analysis on physical machines, and a distributed deception-based infrastructure for in-cloud threat reasoning.

PDF

Interpret Support action

SCRUTINIZER: Detecting Code Reuse in Malware via Decompilation and Machine Learning

2021

SCRUTINIZER provides automated campaign attribution by identifying code reuse across malware samples at the function level. Using an unsupervised ML approach to filter irrelevant functions before comparison, it builds a knowledge base of tagged campaigns and identifies how much overlap unknown samples share with known actors. The system identified 12 previously unknown samples connected to known campaigns, demonstrating that function-level encoding generalizes across obfuscation and compiler variation.

PDF

Code Integrity & Supply Chain

Interpret Prioritize Support action

Evaluating Security Checks Against Malicious Payloads with Forged Signatures

IEEE IEMCON 2025

Adversaries increasingly hijack legitimate code-signing certificates and attach them to malicious binaries to deceive browsers and operating systems into allowing execution. This study empirically evaluates how modern browsers respond to untrusted, signed malicious binaries, revealing that browser responses differ significantly from one another — and that the OS may respond ineffectively, leaving users vulnerable to a straightforward and low-cost adversarial tactic. The paper shows that a browser extension can significantly reduce the attack surface exposed by certificate abuse.

PDF

← Back to Research Full publication list