The vulnerability of machine learning models to adversarial perturbations has motivated a significant amount of research under the broad umbrella of adversarial machine learning. Sophisticated attacks may cause learning algorithms to learn decision functions or make decisions with poor predictive performance. In this context, there is a growing body of literature that uses local intrinsic dimensionality (LID), a local metric that describes the minimum number of latent variables required to describe each data point, for detecting adversarial samples and subsequently mitigating their effects. The research to date has tended to focus on using LID as a practical defence method often without fully explaining why LID can detect adversarial samples. In this paper, we derive a lower-bound and an upper-bound for the LID value of a perturbed data point and demonstrate that the bounds, in particular the lower-bound, has a positive correlation with the magnitude of the perturbation. Hence, we demonstrate that data points that are perturbed by a large amount would have large LID values compared to unperturbed samples, thus justifying its use in the prior literature. Furthermore, our empirical validation demonstrates the validity of the bounds on benchmark datasets.
翻译:机床学习模型易受对抗性扰动的影响,促使在对抗性机床学习的大伞下进行大量研究,典型攻击可能导致学习算法学习决策功能或作出预测性性能差的决定。在这方面,越来越多的文献使用本地内在维度(LID),即描述每个数据点所需的最低潜在变量数量的本地度量,用于检测对抗性样品并随后减轻其影响。迄今为止的研究往往侧重于将LID作为一种实用的防御方法,往往没有充分解释LID能够检测对抗性样品的原因。在本文中,我们从一个环绕的数据点的LID值中得出一个下限和上限值,并表明界限,特别是下限,与扰动的程度有正相关关系。因此,我们证明大量受扰动的数据点与未受扰动的样品相比,将具有较大的LID值,从而证明它在先前文献中使用是正当的。此外,我们的经验验证证明了基准数据集的界限的有效性。