Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis -- even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the quantity of interest on the underlying population. The cost of this protection is the accuracy loss incurred by differential privacy. In this work, inspired by standard models in the genomics literature, we consider data models in which individuals are represented by a sequence of attributes with the property that where distant attributes are only weakly correlated. We show that, under this assumption, it is possible to "re-use" privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.
翻译:人们知道,不同的隐私可以保护人们免受因适应性或探索性数据分析而导致的有效性威胁 -- -- 即使分析家对口搜索与所涉人口利息数额的真正价值不同的统计估计数据,这种保护的成本是差异隐私造成的准确性损失。在这项工作中,在基因组学文献的标准模型的启发下,我们考虑了一些数据模型,在这些数据模型中,个人有一系列属性代表的属性,而这些属性与遥远的属性只有微弱的关联性。我们表明,根据这一假设,可以对数据的不同部分进行“重新使用”隐私预算,大大提高准确性,同时又不增加过分匹配的风险。