Outlier detection plays a significant role in various real world applications such as intrusion, malfunction, and fraud detection. Traditionally, outlier detection techniques are applied to find outliers in the context of the whole dataset. However, this practice neglects contextual outliers, that are not outliers in the whole dataset but in some specific neighborhoods. Contextual outliers are particularly important in data exploration and targeted anomaly explanation and diagnosis. In these scenarios, the data owner computes the following information: i) The attributes that contribute to the abnormality of an outlier (metric), ii) Contextual description of the outlier's neighborhoods (context), and iii) The utility score of the context, e.g. its strength in showing the outlier's significance, or in relation to a particular explanation for the outlier. However, revealing the outlier's context leaks information about the other individuals in the population as well, violating their privacy. We address the issue of population privacy violations in this paper, and propose a solution for the two main challenges. In this setting, the data owner is required to release a valid context for the queried record, i.e. a context in which the record is an outlier. Hence, the first major challenge is that the privacy technique must preserve the validity of the context for each record. We propose techniques to protect the privacy of individuals through a relaxed notion of differential privacy to satisfy this requirement. The second major challenge is applying the proposed techniques efficiently, as they impose intensive computation to the base algorithm. To overcome this challenge, we propose a graph structure to map the contexts to, and introduce differentially private graph search algorithms as efficient solutions for the computation problem caused by differential privacy techniques.
翻译:外部探测在入侵、故障和欺诈探测等各种真实世界应用中起着重要作用。传统上,使用异常探测技术是为了在整个数据集中找到外部点。然而,这种做法忽视了背景外部点,这并非整个数据集中的外部点,而是某些特定社区中的外部点。在数据探索和有针对性的异常解释和诊断中,背景外部点特别重要。在这些假设中,数据所有人计算出下列信息:i) 有助于异常点(度)的异常性的属性,ii) 外点的周围点的背景描述(背景描述)和iii) 环境的效用评分,例如,其显示外部点不是整个数据集的异常点,而是某些特定环境的异常点。然而,披露外部点在数据探索和有针对性的异常解释和诊断中,外点披露了其他人群的隐私信息。我们在本文中解决了侵犯人口隐私的问题,并提出了解决两大挑战的办法。在此背景下,数据所有人需要发布外部环境的实用值评分,例如显示外部值的重要性,而我们通过内部的精确度来评估,我们提出一个正确的背景。我们提出一个主要的精确度记录,我们提出一个挑战。我们提出一个相对的精确度的精确度, 。我们提出一个解释的精确的精确度的精确度是提出一个解释。我们提出的一个解释。