Safety specification-based adversarial training aims to generate examples violating a formal safety specification and therefore provides approaches for repair. The need for maintaining high prediction accuracy while ensuring the save behavior remains challenging. Thus we present SpecAttack, a query-efficient counter-example generation and repair method for deep neural networks. Using SpecAttack allows specifying safety constraints on the model to find inputs that violate these constraints. These violations are then used to repair the neural network via re-training such that it becomes provably safe. We evaluate SpecAttack's performance on the task of counter-example generation and repair. Our experimental evaluation demonstrates that SpecAttack is in most cases more query-efficient than comparable attacks, yields counter-examples of higher quality, with its repair technique being more efficient, maintaining higher functional correctness, and provably guaranteeing safety specification compliance.
翻译:以安全规格为基础的对抗性培训旨在产生违反正式安全规格的实例,从而提供修理方法。在确保保存行为的同时保持高预测准确性的必要性仍然具有挑战性。因此,我们介绍SpecAttack,这是对深神经网络的一种有查询效率的反比生成和修复方法。使用SpecAttack,可以在模型中具体说明安全限制,以便找到违反这些限制的投入。然后,这些违规情况被用来通过再培训来修复神经网络,使其变得可以比较安全。我们评估SpecAttack在反示例生成和修复任务方面的表现。我们的实验性评估表明,SpecAttack在多数情况下比类似的攻击更具查询效率,产生质量更高的反比,其修理技术效率更高,保持更高的功能正确性,并且可以保证安全规范得到遵守。