Recent advances in web technologies make it more difficult than ever to detect and block web tracking systems. In this work, we propose ASTrack, a novel approach to web tracking detection and removal. ASTrack uses an abstraction of the code structure based on Abstract Syntax Trees to selectively identify web tracking functionality shared across multiple web services. This new methodology allows us to: (i) effectively detect web tracking code even when using evasion techniques (e.g., obfuscation, minification, or webpackaging); and (ii) safely remove those portions of code related to tracking purposes without affecting the legitimate functionality of the website. Our evaluation with the top 10k most popular Internet domains shows that ASTrack can detect web tracking with high precision (98%), while discovering about 50k tracking code pieces and more than 3,400 new tracking URLs not previously recognized by most popular privacy-preserving tools (e.g., uBlock Origin). Moreover, ASTrack achieved a 36% reduction in functionality loss in comparison with the filter lists, one of the safest options available. Using a novel methodology that combines computer vision and manual inspection, we estimate that full functionality is preserved in more than 97% of the websites.
翻译:在这项工作中,我们提议采用ASTrack, 这是一种新颖的网络跟踪检测和清除方法。ASTrack使用基于“简易语库树”的代码结构抽象,有选择地识别多个网络服务共享的网络跟踪功能。这一新的方法使我们能够:(一) 有效检测网络跟踪代码,即使使用规避技术(如混淆、简化或网络包装等),也是如此;(二) 安全地清除与跟踪目的有关的代码中与跟踪目的有关的部分,同时不影响网站的合法功能。我们对最受欢迎的10公里互联网域域的评估结果表明,ASTrack能够以高精度(98%)对网络跟踪进行检测,同时发现大约50公里跟踪代码元件和3 400多个以前多数流行的隐私保护工具(如UBlock起源)没有承认的新的跟踪URL。此外,ASTrack与过滤清单相比,一个最安全的选择方案,在功能损失方面减少了36%。我们使用新颖的方法,将计算机视觉和手工检查结合起来,我们估计,完全的功能将超过97个网站保存到更多。