This paper considers the private release of statistics of disjoint subsets of a dataset, in the setting of data heterogeneity, where users could contribute more than one sample, with different users contributing potentially different numbers of samples. In particular, we focus on the $\epsilon$-differentially private release of sample means and variances of sample values in disjoint subsets of a dataset, under the assumption that the numbers of contributions of each user in each subset is publicly known. Our main contribution is an iterative algorithm, based on suppressing user contributions, which seeks to reduce the overall privacy loss degradation under a canonical Laplace mechanism, while not increasing the worst estimation error among the subsets. Important components of this analysis are our exact, analytical characterizations of the sensitivities and the worst-case bias errors of estimators of the sample mean and variance, which are obtained by clipping or suppressing user contributions. We test the performance of our algorithm on real-world and synthetic datasets and demonstrate clear improvements in the privacy loss degradation, for fixed worst-case estimation error.
翻译:暂无翻译