While machine-learning algorithms have demonstrated a strong ability in detecting Android malware, they can be evaded by sparse evasion attacks crafted by injecting a small set of fake components, e.g., permissions and system calls, without compromising intrusive functionality. Previous work has shown that, to improve robustness against such attacks, learning algorithms should avoid overemphasizing few discriminant features, providing instead decisions that rely upon a large subset of components. In this work, we investigate whether gradient-based attribution methods, used to explain classifiers' decisions by identifying the most relevant features, can be used to help identify and select more robust algorithms. To this end, we propose to exploit two different metrics that represent the evenness of explanations, and a new compact security measure called Adversarial Robustness Metric. Our experiments conducted on two different datasets and five classification algorithms for Android malware detection show that a strong connection exists between the uniformity of explanations and adversarial robustness. In particular, we found that popular techniques like Gradient*Input and Integrated Gradients are strongly correlated to security when applied to both linear and nonlinear detectors, while more elementary explanation techniques like the simple Gradient do not provide reliable information about the robustness of such classifiers.
翻译:虽然机器学习算法在发现安卓恶意软件方面表现出了很强的能力,但是它们可以通过通过注射一小套假构件(例如许可和系统呼叫),在不损害侵入性功能的情况下,通过注入一小套假构件(例如,许可和系统呼叫)来巧妙的逃逸攻击而逃避。先前的工作表明,为提高抵御这种攻击的稳健性,学习算法应当避免过分强调少数不同特征,提供依赖大量组件的决定。在这项工作中,我们调查以梯度为基础的归因方法,用以通过确定最相关的特征来解释分类师的决定,是否可以用来帮助识别和选择更强有力的算法。为此,我们提议利用两种不同的计量标准,这些衡量解释的公平性,以及称为Adversarial 强力计量的新的紧凑安全措施。我们在两种不同的数据集和五种分类算法上进行的实验表明,解释的一致性和对抗性恶意检测能力之间有很强的联系。特别是,我们发现,用于解释最相关特性的梯度* 和综合梯度等流行技术,对于安全来说,在应用更可靠的直线性和非精确的精度技术时,则提供较可靠的精准的精准的精准的精准的精准性数据。