The proliferation of global censorship has led to the development of a plethora of measurement platforms to monitor and expose it. Censorship of the domain name system (DNS) is a key mechanism used across different countries. It is currently detected by applying heuristics to samples of DNS queries and responses (probes) for specific destinations. These heuristics, however, are both platform-specific and have been found to be brittle when censors change their blocking behavior, necessitating a more reliable automated process for detecting censorship. In this paper, we explore how machine learning (ML) models can (1) help streamline the detection process, (2) improve the usability of large-scale datasets for censorship detection, and (3) discover new censorship instances and blocking signatures missed by existing heuristic methods. Our study shows that supervised models, trained using expert-derived labels on instances of known anomalies and possible censorship, can learn the detection heuristics employed by different measurement platforms. More crucially, we find that unsupervised models, trained solely on uncensored instances, can identify new instances and variations of censorship missed by existing heuristics. Moreover, both methods demonstrate the capability to uncover a substantial number of new DNS blocking signatures, i.e., injected fake IP addresses overlooked by existing heuristics. These results are underpinned by an important methodological finding: comparing the outputs of models trained using the same probes but with labels arising from independent processes allows us to more reliably detect cases of censorship in the absence of ground-truth labels of censorship.
翻译:全球检查制度的扩散导致大量测量平台的建立,以监测和揭露这种制度。对域名制度(DNS)的检查是不同国家使用的关键机制。目前,通过对特定目的地的DNS查询和答复(probes)样本应用超自然检查来检测,对域名制度(DNS)进行检查,对域名制度(DNS)进行检查,对具体目的地的DNS查询和答复(probes)进行抽样检查。然而,这些超自然检查制度既针对平台,在审查改变其阻拦行为时被发现是弱小的。在本文件中,我们探索机器学习模式如何能帮助简化检查程序,(2) 提高大规模审查制度数据集的可用性,(3) 发现新的审查情况,并堵塞现有超自然方法方法方法的缺失。此外,我们的研究显示,受监督的模式,如果使用专家推导出的关于已知异常现象和可能的检查的标签,可以了解不同测量平台采用的检测超自然检查方法。更为重要的是,我们发现,不严密的模型(仅以未经检验的事例来训练),可以辨别现有超自然审查的标签制度(MLML)模式的新案例,而可以发现,而现有的超自然审查制度在现有的超自然检验结果中又能检验结果中可以发现新的结果。通过经检验结果中发现, 。两种方法检验后,这些方法都展示了新的方法能展示了现有黑地检验新的检验结果。