This paper presents a novel method to generate differentially private tabular datasets for hierarchical data, with a specific focus on origin-destination (O/D) trips. The approach builds upon the TopDown algorithm, a constraint-based mechanism designed to incorporate invariant queries into tabular data, developed by the US Census. O/D hierarchical data refers to datasets representing trips between geographical areas organized in a hierarchical structure (e.g., region $\rightarrow$ province $\rightarrow$ city). The developed method is crafted to improve accuracy on queries spanning wider geographical areas that can be obtained by aggregation. Maintaining high accuracy for aggregated geographical queries is a crucial attribute of the differentially private dataset, particularly for practitioners. Furthermore, the approach is designed to minimize false positives detection and to replicate the sparsity of the sensitive data. The key technical contributions of this paper include a novel TopDown algorithm that employs constrained optimization with Chebyshev distance minimization, with theoretical guarantees based on the maximum absolute error. Additionally, we propose a new integer optimization algorithm that significantly reduces the incidence of false positives. The effectiveness of the proposed approach is validated using both real-world and synthetic O/D datasets, demonstrating its ability to generate private data with high utility and a reduced number of false positives. We emphasize that the proposed algorithm is applicable to any tabular data with a hierarchical structure.
翻译:暂无翻译