Deep Active Learning (DAL) aims to reduce labeling costs in neural-network training by prioritizing the most informative unlabeled samples for annotation. Beyond selecting which samples to label, several DAL approaches further enhance data efficiency by augmenting the training set with synthetic inputs that do not require additional manual labeling. In this work, we investigate how augmenting the training data with adversarial inputs that violate robustness constraints can improve DAL performance. We show that adversarial examples generated via formal verification contribute substantially more than those produced by standard, gradient-based attacks. We apply this extension to multiple modern DAL techniques, as well as to a new technique that we propose, and show that it yields significant improvements in model generalization across standard benchmarks.
翻译:深度主动学习旨在通过优先标注信息量最大的未标记样本来降低神经网络训练中的标注成本。除了选择哪些样本进行标注外,若干深度主动学习方法还通过向训练集添加无需人工标注的合成输入来进一步提升数据效率。本研究探讨了如何通过添加违反鲁棒性约束的对抗性输入来增强训练数据,从而提升深度主动学习的性能。我们证明,通过形式化验证生成的对抗样本比基于标准梯度攻击生成的样本贡献更为显著。我们将此扩展应用于多种现代深度主动学习技术以及我们提出的新技术,结果表明该方法在标准基准测试中显著提升了模型的泛化能力。