Snoopy:一个网页指纹打印框架,并配有微小的大规模监视查询模型 (Snoopy: A Webpage Fingerprinting Framework with Finite Query Model for Mass-Surveillance)

from arxiv, The codes used for the analyses presented in the paper will be made available online only after the manuscript is accepted for publication at any conference/journal

Internet users are vulnerable to privacy attacks despite the use of encryption. Webpage fingerprinting, an attack that analyzes encrypted traffic, can identify the webpages visited by a user in a given website. Recent research works have been successful in demonstrating webpage fingerprinting attacks on individual users, but have been unsuccessful in extending their attack for mass-surveillance. The key challenges in performing mass-scale webpage fingerprinting arises from (i) the sheer number of combinations of user behavior and preferences to account for, and; (ii) the bound on the number of website queries imposed by the defense mechanisms (e.g., DDoS defense) deployed at the website. These constraints preclude the use of conventional data-intensive ML-based techniques. In this work, we propose Snoopy, a first-of-its-kind framework, that performs webpage fingerprinting for a large number of users visiting a website. Snoopy caters to the generalization requirements of mass-surveillance while complying with a bound on the number of website accesses (finite query model) for traffic sample collection. For this, Snoopy uses a feature (i.e., sequence of encrypted resource sizes) that is either unaffected or predictably affected by different browsing contexts (OS, browser, caching, cookie settings). Snoopy uses static analysis techniques to predict the variations caused by factors such as header sizes, MTU, and User Agent String that arise from the diversity in browsing contexts. We show that Snoopy achieves approximately 90% accuracy when evaluated on most websites, across various browsing contexts. A simple ensemble of Snoopy and an ML-based technique achieves approximately 97% accuracy while adhering to the finite query model, in cases when Snoopy alone does not perform well.

翻译：尽管使用加密方法,但互联网用户很容易受到隐私攻击。网页指纹是一种分析加密流量的攻击,可以识别用户在特定网站访问的网页。最近的研究工作成功地展示了网页指纹攻击个人用户的行为,但未能成功扩展其大规模监视攻击。执行大规模网页指纹的主要挑战来自(一) 用户行为和偏好组合的简单数量,以及(二) 网站安装的防御机制(例如DDoS防御)对用户访问的用户查询次数的约束。这些限制排除了使用传统数据密集型ML为基础的技术。在这项工作中,我们提议Snoopy,这是为访问网站的大批用户进行网页指纹检查的第一个框架。Snoopy满足了大规模浏览的通用要求。Snoopy在遵守网站访问数量(如不易查询模式)的同时,对网络访问量的准确性进行了约束。对于常规数据密集型的准确性,Snoyral cloy crial cloces 来说,Snopy climate creal road orates 在Sdeal road road roups rence rence rence 中, ibs 在Snal roupal ral ral rus rus rus 中, rodududududududududududududududududududududududududududuces 造成了各种Stors, Sral s s s s s s