When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, endering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.
翻译:当数据库受到差异隐私(DP)的保护时,其可用性就有限了。在这一假设中,生成了一个模拟私人数据特性的数据合成版本,使用户得以在保持原始数据隐私的同时,对合成数据进行任何操作,因此,已进行了多项工作,专门设计了DP合成数据生成系统;然而,这些系统可能保存甚至放大数据属性,使其变得不公平,从而终止了不适合使用的合成数据。在这项工作中,我们介绍了PreFair,这是一个允许DP公平合成数据生成的系统。PreFaire通过纳入一个确保公平合成数据的因果关系公平标准,扩展了最新DP数据生成机制。我们调整了合理公平的概念,以适应合成数据生成的假设。我们进一步研究了DP公平合成数据生成的问题,展示了这些数据的不可忽视性,并设计了某些假设下最优的算法。我们还提供了广泛的实验性评估,表明PreFair生成的合成数据比引导DP数据生成机制生成的数据更为公平,同时忠实于私人数据。