Differential privacy provides a strong form of privacy and allows preserving most original characteristics of the data set. Utilizing these benefits requires one to design specific differentially private data analysis algorithms. In this work, we present three tree-based algorithms for mining redescriptions while preserving differential privacy. Redescription mining is an exploratory data analysis method for finding connections between two views over the same entities, such as phenotypes and genotypes of medical patients, for example. It has applications in many fields, including some, like health care informatics, where privacy-preserving access to data is desired. Our algorithms are the first differentially private redescription mining algorithms, and we show via experiments that, despite the inherent noise in differential privacy, it can return trustworthy results even in smaller data sets where noise typically has a stronger effect.
翻译:不同隐私提供了一种强大的隐私形式,允许保存数据集的最原始特征。 利用这些好处需要设计一种具体的、有差别的私人数据分析算法。 在这项工作中,我们提出了三种基于树的算法,用于采矿重新定型,同时保护不同的隐私。重新定型采矿是一种探索性的数据分析方法,用于在相同实体的两种观点之间找到联系,例如对病人的苯型和基因型。它在许多领域都有应用,包括一些领域,如保健信息学,需要保留隐私以获取数据。我们的算法是第一个有差别的私人重新定型采矿算法,我们通过实验显示,尽管在不同的隐私中存在固有的噪音,但它可以将可靠的结果归还到较小数据集中,那里的噪音通常效果更大。