This paper considers the $\varepsilon$-differentially private (DP) release of an approximate cumulative distribution function (CDF) of the samples in a dataset. We assume that the true (approximate) CDF is obtained after lumping the data samples into a fixed number $K$ of bins. In this work, we extend the well-known binary tree mechanism to the class of \emph{level-uniform tree-based} mechanisms and identify $\varepsilon$-DP mechanisms that have a small $\ell_2$-error. We identify optimal or close-to-optimal tree structures when either of the parameters, which are the branching factors or the privacy budgets at each tree level, are given, and when the algorithm designer is free to choose both sets of parameters. Interestingly, when we allow the branching factors to take on real values, under certain mild restrictions, the optimal level-uniform tree-based mechanism is obtained by choosing equal branching factors \emph{independent} of $K$, and equal privacy budgets at all levels. Furthermore, for selected $K$ values, we explicitly identify the optimal \emph{integer} branching factors and tree height, assuming equal privacy budgets at all levels. Finally, we describe general strategies for improving the private CDF estimates further, by combining multiple noisy estimates and by post-processing the estimates for consistency.
翻译:暂无翻译