When applying outlier detection in settings where data is sensitive, mechanisms which guarantee the privacy of the underlying data are needed. The $k$-nearest neighbors ($k$-NN) algorithm is a simple and one of the most effective methods for outlier detection. So far, there have been no attempts made to develop a differentially private ($\epsilon$-DP) approach for $k$-NN based outlier detection. Existing approaches often relax the notion of $\epsilon$-DP and employ other methods than $k$-NN. We propose a method for $k$-NN based outlier detection by separating the procedure into a fitting step on reference inlier data and then apply the outlier classifier to new data. We achieve $\epsilon$-DP for both the fitting algorithm and the outlier classifier with respect to the reference data by partitioning the dataset into a uniform grid, which yields low global sensitivity. Our approach yields nearly optimal performance on real-world data with varying dimensions when compared to the non-private versions of $k$-NN.
翻译:在数据敏感的环境中应用外部探测时,需要保证基本数据隐私的机制。美元最近邻(k$-NN)算法是一个简单的、最有效的方法之一。到目前为止,还没有尝试为以美元-NN美元为基础的外部探测开发一种有区别的私人(epsilon$-DP)方法。现有的方法往往会放松美元-美元-DP的概念,并采用美元-NN美元以外的方法。我们建议一种基于美元-NNN的外部探测方法,将程序分离成参考数据的适当步骤,然后将外部分类器应用于新数据。我们通过将数据集分割成一个统一的网,将数据元分解成一个全球敏感度低的网,从而在参考数据方面实现美元-美元-美元-美元-DP的匹配。我们的方法在现实世界数据上取得了几乎最佳的绩效,与非私人版本的美元-NN美元-NN美元相比,其范围各不相同。