Deep neural networks usually require large labeled datasets for training to achieve state-of-the-art performance in many tasks, such as image classification and natural language processing. Although a lot of data is created each day by active Internet users, most of these data are unlabeled and are vulnerable to data poisoning attacks. In this paper, we develop an efficient active learning method that requires fewer labeled instances and incorporates the technique of adversarial retraining in which additional labeled artificial data are generated without increasing the budget of the labeling. The generated adversarial examples also provide a way to measure the vulnerability of the model. To check the performance of the proposed method under an adversarial setting, i.e., malicious mislabeling and data poisoning attacks, we perform an extensive evaluation on the reduced CIFAR-10 dataset, which contains only two classes: airplane and frog. Our experimental results demonstrate that the proposed active learning method is efficient for defending against malicious mislabeling and data poisoning attacks. Specifically, whereas the baseline active learning method based on the random sampling strategy performs poorly (about 50%) under a malicious mislabeling attack, the proposed active learning method can achieve the desired accuracy of 89% using only one-third of the dataset on average.
翻译:深心神经网络通常需要大量的标签数据集来进行培训,以达到许多任务(如图像分类和自然语言处理等)中最先进的性能。虽然活跃的互联网用户每天创造大量数据,但这些数据大多没有标签,容易发生数据中毒袭击。在本文件中,我们开发了高效的积极学习方法,这种方法需要较少标签实例,并结合了对抗性再培训技术,在不增加标签预算的情况下产生额外的标签人工数据。生成的对抗性实例也为衡量模型脆弱性提供了一种方法。在恶意贴标签攻击的情况下,检查拟议方法的性能,即恶意贴标签和数据中毒攻击,我们对减少的CIFAR-10数据集进行了广泛的评估,该数据集仅包含两个类别:飞机和青蛙。我们的实验结果显示,拟议的积极学习方法对于防范恶意贴标签和数据中毒攻击是有效的。具体地说,根据随机抽样战略制定的基准积极学习方法在恶意贴标签攻击下表现很差(约50%),而拟议的积极学习方法仅能达到89%理想的准确性,仅使用三分之一的数据。