The Web's dynamic, interconnected nature creates persistent attack surfaces — from malicious third-party content and automated scams to brittle security mechanisms exploited by adversarial bots. We study these threats empirically at scale and build data-driven defenses that make web security behavior visible, measurable, and actionable.
Expose security-relevant behavior across web ecosystems, browser internals, runtime execution, and automated interactions at scale.
Connect low-level evidence to attacker goals, campaign infrastructure, modus operandi, and operational risk.
Rank findings by user exposure, reachability, and harm so defensive attention lands where it matters most.
Produce deployed tools, disclosed CVEs, and shared datasets that defenders, developers, and researchers can act on.
We instrument browser internals and deploy large-scale crawlers to capture third-party content loading, JavaScript execution paths, automated agent behavior, and attack page growth across millions of sites.
Justified by: Outguard (WWW'19) • Surveylance (S&P'18) • Covid Scams (ASIA CCS'23) • CAPTCHAs (WWW'24)Raw measurements are analyzed to identify campaign infrastructure, modus operandi, provenance of injected content, and the fundamental limits of adversarial evasion against multi-modal defenses.
Justified by: Covid Scams (ASIA CCS'23) • OriginTracer (RAID'16) • Bot Barrier (WWW'24) • WebGuard (2024)Outguard identified 35 live campaigns; Surveylance traced 40% of scam pages back to the Alexa top 30K; the CAPTCHA study found 3,100+ affected sites in finance, government, and health — directly guiding where defenses are needed first.
Justified by: Outguard (WWW'19) • Surveylance (S&P'18) • CAPTCHAs (WWW'24) • NodeSec (WWW'24)Research produces usable artifacts: WebGuard and PriveShield are deployed in-application engines; NodeSec produced 19 CVEs and two US-CERT cases; Outguard, Surveylance, and Excision provide detection infrastructure security teams can adopt.
Justified by: WebGuard (2024) • PriveShield (2025) • NodeSec (WWW'24) • Outguard (WWW'19) • Excision (FC'16)Outguard instruments browser parallelism primitives — rather than CPU thresholds or static signatures — to detect cryptojacking campaigns that evade rule-based filters. Deployed in the wild, it identified 35 active campaigns, 6,328 cryptojacking websites, and 24 previously unreported mining services, demonstrating that behavioral fingerprinting generalizes across obfuscated variants.
Surveylance crawled the web to map the full infrastructure of survey scam ecosystems, uncovering 8,623 gateway websites that funneled victims into over 318,000 deceptive survey pages. A critical finding: 40% of those pages were reachable directly from the Alexa top 30K, exposing mainstream users to identity fraud, malware, and scareware at scale.
This retrospective study tracked adversarial operations across the four months immediately following the Covid-19 outbreak (Feb–June 2020). By combining multiple measurement perspectives, it analyzes the composition, growth, and reachability of attack pages; reconstructs attacker modus operandi; and quantifies impact on end-users. The study shows how adversaries' technical and operational agility allowed novel attack techniques to bypass common defenses within weeks of a global crisis.
Excision is an in-browser framework that provides a high-fidelity, real-time view of third-party content inclusion by monitoring the JavaScript loading pipeline directly. This approach uncovered novel distribution infrastructure for malicious code that static server-side analysis misses entirely, and enabled real-time interdiction of content loading before harm reaches the user.
OriginTracer tracks the provenance of web content modifications caused by browser extensions during page load, exposing a class of ad-injection attacks that server-side analysis cannot see. The lightweight provenance mechanism illuminated how extensions silently rewrite pages and inspired follow-on privacy research on web tracking and advertisement delivery.
PriveShield is a browser extension that disrupts the information-gathering cycle of online tracking without requiring users to disable JavaScript or modify browser code. It automatically creates isolated profiles based on browsing history and site interactions, preventing cookie-based retargeting in 91% of 54 real-world test scenarios — while preserving the full browsing experience.
NodeSec is an automated tool that generates unique payloads and evaluates web applications against 13 distinct Unrestricted File Upload (UFU) attack types. Applied to the most popular Node.js file upload libraries and real-world applications, it disclosed serious security bugs — resulting in 19 CVEs and two US-CERT cases. The study provides strong evidence that the dynamic features of Node.js applications introduce systematic security shortcomings in upload handling.
This study evaluates the real-world security of text-based CAPTCHAs across the web by integrating a pre-trained solver into an automated web scanner — without a large training dataset. The scanner cracked more than 20% of previously unseen CAPTCHAs in a single attempt. Most critically, the study identified over 3,100 CAPTCHA-protected sites in high-risk sectors — finance, government, and healthcare — where this asymmetric attack capability poses serious operational risk.
EnSolver equips CAPTCHA solvers with deep ensemble uncertainty estimation, enabling them to detect and skip out-of-distribution challenges rather than failing them — which would trigger lockout defenses. By reasoning about its own confidence, EnSolver is harder to detect and block. Novel theoretical bounds on solver effectiveness are proved and validated experimentally, providing a rigorous framework for assessing CAPTCHA robustness.
Multi-modal bot detection models that analyze spatio-temporal browser events are increasingly deployed as a defense against credential stuffing and scanning. This work trains an LSTM on 825,701 artifacts from 46 users and develops two adversarial attacks — fast gradient and brute force — that generate misclassified behavioral vectors. A key finding: despite generating adversarially valid vectors, translating them into real-world scanning experiments was infeasible due to fundamental limitations of automated tools in satisfying the required spatio-temporal constraints.
WebGuard is a low-overhead forensics engine that integrates directly into web applications — with no changes to underlying software or infrastructure — to detect and attribute automated web scanners in real time. It monitors multi-modal signals including spatio-temporal data and browser events, achieving detection within hundreds of milliseconds at under 10 KB/s communication overhead. Information-theoretic analysis shows that multi-modal monitoring significantly outperforms uni-modal (mouse-movement-only) approaches in both detection speed and attribution accuracy.
← Back to Research Full publication list