Web Security and Agentic Web Attacks

Research Methodology

01 Observe
Expose security-relevant behavior across web ecosystems, browser internals, runtime execution, and automated interactions at scale.
02 Interpret
Connect low-level evidence to attacker goals, campaign infrastructure, modus operandi, and operational risk.
03 Prioritize
Rank findings by user exposure, reachability, and harm so defensive attention lands where it matters most.
04 Support action
Produce deployed tools, disclosed CVEs, and shared datasets that defenders, developers, and researchers can act on.

Applied to Web Security

Observe
Large-scale crawling and browser instrumentation
We instrument browser internals and deploy large-scale crawlers to capture third-party content loading, JavaScript execution paths, automated agent behavior, and attack page growth across millions of sites.
Justified by: Outguard (WWW'19) • Surveylance (S&P'18) • Covid Scams (ASIA CCS'23) • CAPTCHAs (WWW'24)
Interpret
Mapping evidence to attacker campaigns and evasion strategies
Raw measurements are analyzed to identify campaign infrastructure, modus operandi, provenance of injected content, and the fundamental limits of adversarial evasion against multi-modal defenses.
Justified by: Covid Scams (ASIA CCS'23) • OriginTracer (RAID'16) • Bot Barrier (WWW'24) • WebGuard (2024)
Prioritize
Ranking threats by reachability, sector criticality, and user exposure
Outguard identified 35 live campaigns; Surveylance traced 40% of scam pages back to the Alexa top 30K; the CAPTCHA study found 3,100+ affected sites in finance, government, and health — directly guiding where defenses are needed first.
Justified by: Outguard (WWW'19) • Surveylance (S&P'18) • CAPTCHAs (WWW'24) • NodeSec (WWW'24)
Support action
Deployed tools, CVE disclosures, and open datasets
Research produces usable artifacts: WebGuard and PriveShield are deployed in-application engines; NodeSec produced 19 CVEs and two US-CERT cases; Outguard, Surveylance, and Excision provide detection infrastructure security teams can adopt.
Justified by: WebGuard (2024) • PriveShield (2025) • NodeSec (WWW'24) • Outguard (WWW'19) • Excision (FC'16)

Web Measurement & Attack Ecosystems

Observe Interpret

Outguard: Detecting In-Browser Covert Cryptocurrency Mining in the Wild

The Web Conference (WWW) 2019 — Best Paper Award (1 of 225 accepted)

Outguard instruments browser parallelism primitives — rather than CPU thresholds or static signatures — to detect cryptojacking campaigns that evade rule-based filters. Deployed in the wild, it identified 35 active campaigns, 6,328 cryptojacking websites, and 24 previously unreported mining services, demonstrating that behavioral fingerprinting generalizes across obfuscated variants.

PDF

Observe Interpret Prioritize

Surveylance: Automatically Detecting Online Survey Scams

IEEE Security & Privacy (S&P) 2018

Surveylance crawled the web to map the full infrastructure of survey scam ecosystems, uncovering 8,623 gateway websites that funneled victims into over 318,000 deceptive survey pages. A critical finding: 40% of those pages were reachable directly from the Alexa top 30K, exposing mainstream users to identity fraud, malware, and scareware at scale.

PDF

Observe Interpret

An End-to-End Analysis of Covid-Themed Scams in the Wild

ACM ASIA CCS 2023

This retrospective study tracked adversarial operations across the four months immediately following the Covid-19 outbreak (Feb–June 2020). By combining multiple measurement perspectives, it analyzes the composition, growth, and reachability of attack pages; reconstructs attacker modus operandi; and quantifies impact on end-users. The study shows how adversaries' technical and operational agility allowed novel attack techniques to bypass common defenses within weeks of a global crisis.

PDF

Browser Security & Content Integrity

Observe Support action

Include Me Out: In-Browser Detection of Malicious Third-Party Content Inclusions

Financial Cryptography and Data Security (FC) 2016

Excision is an in-browser framework that provides a high-fidelity, real-time view of third-party content inclusion by monitoring the JavaScript loading pipeline directly. This approach uncovered novel distribution infrastructure for malicious code that static server-side analysis misses entirely, and enabled real-time interdiction of content loading before harm reaches the user.

PDF

Interpret Support action

Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance

RAID 2016

OriginTracer tracks the provenance of web content modifications caused by browser extensions during page load, exposing a class of ad-injection attacks that server-side analysis cannot see. The lightweight provenance mechanism illuminated how extensions silently rewrite pages and inspired follow-on privacy research on web tracking and advertisement delivery.

PDF

Support action

PriveShield: Enhancing User Privacy Using Automatic Isolated Profiles in Browsers

arXiv preprint, 2025

PriveShield is a browser extension that disrupts the information-gathering cycle of online tracking without requiring users to disable JavaScript or modify browser code. It automatically creates isolated profiles based on browsing history and site interactions, preventing cookie-based retargeting in 91% of 54 real-world test scenarios — while preserving the full browsing experience.

PDF

Web Application Security

Observe Prioritize Support action

(In)Security of File Uploads in Node.js

The Web Conference (WWW) 2024

NodeSec is an automated tool that generates unique payloads and evaluates web applications against 13 distinct Unrestricted File Upload (UFU) attack types. Applied to the most popular Node.js file upload libraries and real-world applications, it disclosed serious security bugs — resulting in 19 CVEs and two US-CERT cases. The study provides strong evidence that the dynamic features of Node.js applications introduce systematic security shortcomings in upload handling.

PDF

Automated Agent Detection & CAPTCHA Security

Observe Prioritize

The Matter of CAPTCHAs: An Analysis of a Brittle Security Feature on the Modern Web

The Web Conference (WWW) 2024

This study evaluates the real-world security of text-based CAPTCHAs across the web by integrating a pre-trained solver into an automated web scanner — without a large training dataset. The scanner cracked more than 20% of previously unseen CAPTCHAs in a single attempt. Most critically, the study identified over 3,100 CAPTCHA-protected sites in high-risk sectors — finance, government, and healthcare — where this asymmetric attack capability poses serious operational risk.

PDF

Interpret Support action

EnSolver: Uncertainty-Aware Ensemble CAPTCHA Solvers with Theoretical Guarantees

Journal of Machine Learning Research (JMLR) 2024

EnSolver equips CAPTCHA solvers with deep ensemble uncertainty estimation, enabling them to detect and skip out-of-distribution challenges rather than failing them — which would trigger lockout defenses. By reasoning about its own confidence, EnSolver is harder to detect and block. Novel theoretical bounds on solver effectiveness are proved and validated experimentally, providing a rigorous framework for assessing CAPTCHA robustness.

PDF

Interpret

Breaking the Bot Barrier: Evaluating Adversarial AI Techniques Against Multi-Modal Defense Models

The Web Conference Companion (WWW) 2024

Multi-modal bot detection models that analyze spatio-temporal browser events are increasingly deployed as a defense against credential stuffing and scanning. This work trains an LSTM on 825,701 artifacts from 46 users and develops two adversarial attacks — fast gradient and brute force — that generate misclassified behavioral vectors. A key finding: despite generating adversarially valid vectors, translating them into real-world scanning experiments was infeasible due to fundamental limitations of automated tools in satisfying the required spatio-temporal constraints.

PDF

Interpret Support action

WebGuard: In-Application Defense Against Evasive Web Scans through Behavioral Analysis

arXiv preprint, 2024

WebGuard is a low-overhead forensics engine that integrates directly into web applications — with no changes to underlying software or infrastructure — to detect and attribute automated web scanners in real time. It monitors multi-modal signals including spatio-temporal data and browser events, achieving detection within hundreds of milliseconds at under 10 KB/s communication overhead. Information-theoretic analysis shows that multi-modal monitoring significantly outperforms uni-modal (mouse-movement-only) approaches in both detection speed and attribution accuracy.

PDF

← Back to Research Full publication list