Protecting individuals' private information while still allowing modelers to draw inferences from confidential data sets is a concern of many data producers. Differential privacy is a framework that enables statistical analyses while controlling the potential leakage of private information. Prior work has focused on proposing differentially private statistical methods for various types of confidential data. However, almost no existing work has focused on the analysis of compositional data. In this article, we investigate differentially private approaches for analyzing compositional data using the Dirichlet distribution as the statistical model. We consider several methods, including frequentist and Bayesian procedures, along with computational strategies. We assess the approaches' performance using simulated data and illustrate their usefulness by applying them to data from the American Time Use Survey.
翻译:保护个人的私人信息,同时仍然允许建模者从机密数据集中作出推断,这是许多数据编制者关注的问题。不同的隐私是一个框架,既能进行统计分析,又能控制私人信息的潜在渗漏。先前的工作重点是就各种类型的机密数据提出不同的私人统计方法。然而,几乎没有一项现有工作侧重于对组成数据的分析。在本条中,我们用Drichlet的分布作为统计模型,对分析组成数据的不同私人方法进行调查。我们考虑了几种方法,包括常客程序和Bayesian程序,以及计算战略。我们利用模拟数据评估这些方法的性能,并通过将这些数据应用于美国时间使用调查的数据来说明其有用性。