This paper investigates the vulnerability of the nearest neighbors search, which is a pivotal tool in data analysis and machine learning. The vulnerability is gauged as the relative amount of perturbation that an attacker needs to add onto a dataset point in order to modify its neighbor rank w.r.t. a query. The statistical distribution of this quantity is derived from simple assumptions. Experiments on six large scale datasets validate this model up to some outliers which are explained in term of violations of the assumptions.
翻译:本文调查最近的邻居搜索的脆弱性,这是数据分析和机器学习的关键工具。 脆弱性被测量为攻击者需要添加到一个数据集点上的相对扰动量, 以修改其邻居的级别 。 此数量的统计分布来自简单的假设。 在六个大型数据集上进行的实验证实了这一模型, 直至某些异常点, 这些异常点以违反假设的方式加以解释 。