Robustness of machine learning models is critical for security related applications, where real-world adversaries are uniquely focused on evading neural network based detectors. Prior work mainly focus on crafting adversarial examples (AEs) with small uniform norm-bounded perturbations across features to maintain the requirement of imperceptibility. However, uniform perturbations do not result in realistic AEs in domains such as malware, finance, and social networks. For these types of applications, features typically have some semantically meaningful dependencies. The key idea of our proposed approach is to enable non-uniform perturbations that can adequately represent these feature dependencies during adversarial training. We propose using characteristics of the empirical data distribution, both on correlations between the features and the importance of the features themselves. Using experimental datasets for malware classification, credit risk prediction, and spam detection, we show that our approach is more robust to real-world attacks. Finally, we present robustness certification utilizing non-uniform perturbation bounds, and show that non-uniform bounds achieve better certification.
翻译:机器学习模型的坚固性对于安全相关应用至关重要,在这类应用中,真实世界对手特别侧重于躲避神经网络探测器; 先前的工作主要侧重于设计对抗性实例,在各种特征之间设计出小型、统一规范的干扰,以保持不易感知的要求; 然而,统一的扰动不会导致在恶意软件、金融和社交网络等领域采取现实的ACS。 对于这些类型的应用,特征通常具有某些具有内在意义的依赖性。 我们拟议方法的关键思想是,在对抗性培训期间,使非统一性干扰能够充分体现这些特征的依赖性。 我们提议使用经验性数据分布的特征,既涉及特征的关联性,也涉及特征本身的重要性。 使用实验性数据集对恶意软件进行分类、信用风险预测和垃圾邮件检测,我们表明我们的方法对于真实世界攻击更为强大。 最后,我们提出强健性认证时使用了非统一性的渗透性约束,并表明非统一性约束性能够实现更好的认证。