Previous automatic tracker detection work lacks features to recognize web page breakage and often resort to manual analysis to assess the breakage caused by blocking trackers. We introduce Dumviri, which incorporates a breakage detector that can automatically detect web page breakage caused by erroneously blocking a resource that is needed by the page to function properly. This addition allows Dumviri to prevent functional resources from being misclassified as trackers and increases overall detection accuracy. We designed Dumviri to take differential features. We further find that these features are agnostic to analysis granularity and enable Dumviri to predict tracking resources at the request field granularity, allowing Dumviri to handle some mixed trackers. Evaluating Dumviri on 15K pages shows its ability to replicate the labels of human-generated filter lists with an accuracy of 97.44%. Through a manual analysis, we found that Dumviri identified previously unreported trackers and its breakage detector can identify rules that cause web page breakage in commonly used filter lists like EasyPrivacy. In the case of mixed trackers, Dumviri, being the first automated mixed tracker detector, achieves a 79.09% accuracy. We have confirmed 22 previously unreported unique trackers and 26 unique mixed trackers. We promptly reported these findings to privacy developers, and we will publish our filter lists in uBlock Origin's extended syntax.
翻译:暂无翻译