Internet censorship is a phenomenon of societal importance and attracts investigation from multiple disciplines. Several research groups, such as Censored Planet, have deployed large scale Internet measurement platforms to collect network reachability data. However, existing studies generally rely on manually designed rules (i.e., using censorship fingerprints) to detect network-based Internet censorship from the data. While this rule-based approach yields a high true positive detection rate, it suffers from several challenges: it requires human expertise, is laborious, and cannot detect any censorship not captured by the rules. Seeking to overcome these challenges, we design and evaluate a classification model based on latent feature representation learning and an image-based classification model to detect network-based Internet censorship. To infer latent feature representations fromnetwork reachability data, we propose a sequence-to-sequence autoencoder to capture the structure and the order of data elements in the data. To estimate the probability of censorship events from the inferred latent features, we rely on a densely connected multi-layer neural network model. Our image-based classification model encodes a network reachability data record as a gray-scale image and classifies the image as censored or not using a dense convolutional neural network. We compare and evaluate both approaches using data sets from Censored Planet via a hold-out evaluation. Both classification models are capable of detecting network-based Internet censorship as we were able to identify instances of censorship not detected by the known fingerprints. Latent feature representations likely encode more nuances in the data since the latent feature learning approach discovers a greater quantity, and a more diverse set, of new censorship instances.
翻译:互联网审查是一种具有社会重要性的现象,它吸引了多种学科的调查。一些研究团体,例如《全球警戒》,已经部署了大型互联网测量平台来收集网络可访问性数据。然而,现有的研究一般依靠人工设计的规则(即使用检查指纹)来从数据中检测基于网络的互联网审查。虽然这种基于规则的方法可以产生一个高真实的正检测率,但它有几种挑战:它需要人的专门知识,是艰苦的,无法发现任何没有被规则吸收的检查。为了克服这些挑战,我们设计并评价了一个基于潜在地貌代表学习和基于图像的分类模式来检测基于网络的互联网审查。为了从网络可访问性数据中推断出潜在的地貌代表(即使用检查指纹指纹的指纹),我们建议一个从顺序到顺序的自动编码,以掌握数据元素结构的顺序和顺序。为了根据推断的潜伏性特征来估计审查事件的概率,我们依靠一个密系多层次的多层次网络模型。我们基于图像的分类方法将网络可访问性数据记录为灰度图像的图像图像,并且将图像通过网络的可访问性图像显示的网络的图像,而通过更深层次的网络进行对比的网络,而不是通过更深层次的网络,我们所了解的深度的网络的、更深层次评估,我们所了解的、更深层次的网络的网络的网络的、更深层次化地标值评估。我们所了解的网络的网络的精确的精确的系统,通过一种我们所了解的精确性评估是使用一种我们所了解的网络的、更深层次评估,通过网络的网络的网络的、更能的深度的网络的网络的、更能的、更能的精确性评估,我们所了解的、更能的精确的网络的网络的网络的精确性评估,我们所了解的精确性评估,我们所了解的精确性、更能的精确性、更能的精确性、更能的精确性、更能的精确性评估。