Web Security and Agentic Web Attacks

The Web's dynamic, interconnected nature creates persistent attack surfaces — from malicious third-party content and automated scams to brittle security mechanisms exploited by adversarial bots. We study these threats empirically at scale and build data-driven defenses that make web security behavior visible, measurable, and actionable.

  1. 01 Observe

    Expose security-relevant behavior across web ecosystems, browser internals, runtime execution, and automated interactions at scale.

  2. 02 Interpret

    Connect low-level evidence to attacker goals, campaign infrastructure, modus operandi, and operational risk.

  3. 03 Prioritize

    Rank findings by user exposure, reachability, and harm so defensive attention lands where it matters most.

  4. 04 Support action

    Produce deployed tools, disclosed CVEs, and shared datasets that defenders, developers, and researchers can act on.

  • Observe
    Large-scale crawling and browser instrumentation

    We instrument browser internals and deploy large-scale crawlers to capture third-party content loading, JavaScript execution paths, automated agent behavior, and attack page growth across millions of sites.

    Justified by: Outguard (WWW'19)Surveylance (S&P'18)Covid Scams (ASIA CCS'23)CAPTCHAs (WWW'24)
  • Interpret
    Mapping evidence to attacker campaigns and evasion strategies

    Raw measurements are analyzed to identify campaign infrastructure, modus operandi, provenance of injected content, and the fundamental limits of adversarial evasion against multi-modal defenses.

    Justified by: Covid Scams (ASIA CCS'23)OriginTracer (RAID'16)Bot Barrier (WWW'24)WebGuard (2024)
  • Prioritize
    Ranking threats by reachability, sector criticality, and user exposure

    Outguard identified 35 live campaigns; Surveylance traced 40% of scam pages back to the Alexa top 30K; the CAPTCHA study found 3,100+ affected sites in finance, government, and health — directly guiding where defenses are needed first.

    Justified by: Outguard (WWW'19)Surveylance (S&P'18)CAPTCHAs (WWW'24)NodeSec (WWW'24)
  • Support action
    Deployed tools, CVE disclosures, and open datasets

    Research produces usable artifacts: WebGuard and PriveShield are deployed in-application engines; NodeSec produced 19 CVEs and two US-CERT cases; Outguard, Surveylance, and Excision provide detection infrastructure security teams can adopt.

    Justified by: WebGuard (2024)PriveShield (2025)NodeSec (WWW'24)Outguard (WWW'19)Excision (FC'16)

Web Measurement & Attack Ecosystems

Observe Interpret
Outguard: Detecting In-Browser Covert Cryptocurrency Mining in the Wild
The Web Conference (WWW) 2019 — Best Paper Award (1 of 225 accepted)

Outguard instruments browser parallelism primitives — rather than CPU thresholds or static signatures — to detect cryptojacking campaigns that evade rule-based filters. Deployed in the wild, it identified 35 active campaigns, 6,328 cryptojacking websites, and 24 previously unreported mining services, demonstrating that behavioral fingerprinting generalizes across obfuscated variants.

PDF
Observe Interpret Prioritize
Surveylance: Automatically Detecting Online Survey Scams
IEEE Security & Privacy (S&P) 2018

Surveylance crawled the web to map the full infrastructure of survey scam ecosystems, uncovering 8,623 gateway websites that funneled victims into over 318,000 deceptive survey pages. A critical finding: 40% of those pages were reachable directly from the Alexa top 30K, exposing mainstream users to identity fraud, malware, and scareware at scale.

PDF
Observe Interpret
An End-to-End Analysis of Covid-Themed Scams in the Wild
ACM ASIA CCS 2023

This retrospective study tracked adversarial operations across the four months immediately following the Covid-19 outbreak (Feb–June 2020). By combining multiple measurement perspectives, it analyzes the composition, growth, and reachability of attack pages; reconstructs attacker modus operandi; and quantifies impact on end-users. The study shows how adversaries' technical and operational agility allowed novel attack techniques to bypass common defenses within weeks of a global crisis.

PDF

Browser Security & Content Integrity

Observe Support action
Include Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions
Financial Cryptography and Data Security (FC) 2016

Excision is an in-browser framework that provides a high-fidelity, real-time view of third-party content inclusion by monitoring the JavaScript loading pipeline directly. This approach uncovered novel distribution infrastructure for malicious code that static server-side analysis misses entirely, and enabled real-time interdiction of content loading before harm reaches the user.

PDF
Interpret Support action
Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
RAID 2016

OriginTracer tracks the provenance of web content modifications caused by browser extensions during page load, exposing a class of ad-injection attacks that server-side analysis cannot see. The lightweight provenance mechanism illuminated how extensions silently rewrite pages and inspired follow-on privacy research on web tracking and advertisement delivery.

PDF
Support action
PriveShield: Enhancing User Privacy Using Automatic Isolated Profiles in Browsers
arXiv preprint, 2025

PriveShield is a browser extension that disrupts the information-gathering cycle of online tracking without requiring users to disable JavaScript or modify browser code. It automatically creates isolated profiles based on browsing history and site interactions, preventing cookie-based retargeting in 91% of 54 real-world test scenarios — while preserving the full browsing experience.

PDF

Web Application Security

Observe Prioritize Support action
(In)Security of File Uploads in Node.js
The Web Conference (WWW) 2024

NodeSec is an automated tool that generates unique payloads and evaluates web applications against 13 distinct Unrestricted File Upload (UFU) attack types. Applied to the most popular Node.js file upload libraries and real-world applications, it disclosed serious security bugs — resulting in 19 CVEs and two US-CERT cases. The study provides strong evidence that the dynamic features of Node.js applications introduce systematic security shortcomings in upload handling.

PDF

Automated Agent Detection & CAPTCHA Security

Observe Prioritize
The Matter of CAPTCHAs: An Analysis of a Brittle Security Feature on the Modern Web
The Web Conference (WWW) 2024

This study evaluates the real-world security of text-based CAPTCHAs across the web by integrating a pre-trained solver into an automated web scanner — without a large training dataset. The scanner cracked more than 20% of previously unseen CAPTCHAs in a single attempt. Most critically, the study identified over 3,100 CAPTCHA-protected sites in high-risk sectors — finance, government, and healthcare — where this asymmetric attack capability poses serious operational risk.

PDF
Interpret Support action
EnSolver: Uncertainty-Aware Ensemble CAPTCHA Solvers with Theoretical Guarantees
Journal of Machine Learning Research (JMLR) 2024

EnSolver equips CAPTCHA solvers with deep ensemble uncertainty estimation, enabling them to detect and skip out-of-distribution challenges rather than failing them — which would trigger lockout defenses. By reasoning about its own confidence, EnSolver is harder to detect and block. Novel theoretical bounds on solver effectiveness are proved and validated experimentally, providing a rigorous framework for assessing CAPTCHA robustness.

PDF
Interpret
Breaking the Bot Barrier: Evaluating Adversarial AI Techniques Against Multi-Modal Defense Models
The Web Conference Companion (WWW) 2024

Multi-modal bot detection models that analyze spatio-temporal browser events are increasingly deployed as a defense against credential stuffing and scanning. This work trains an LSTM on 825,701 artifacts from 46 users and develops two adversarial attacks — fast gradient and brute force — that generate misclassified behavioral vectors. A key finding: despite generating adversarially valid vectors, translating them into real-world scanning experiments was infeasible due to fundamental limitations of automated tools in satisfying the required spatio-temporal constraints.

PDF
Interpret Support action
WebGuard: In-Application Defense Against Evasive Web Scans through Behavioral Analysis
arXiv preprint, 2024

WebGuard is a low-overhead forensics engine that integrates directly into web applications — with no changes to underlying software or infrastructure — to detect and attribute automated web scanners in real time. It monitors multi-modal signals including spatio-temporal data and browser events, achieving detection within hundreds of milliseconds at under 10 KB/s communication overhead. Information-theoretic analysis shows that multi-modal monitoring significantly outperforms uni-modal (mouse-movement-only) approaches in both detection speed and attribution accuracy.

PDF

← Back to Research    Full publication list