改进当地有差别的私人协议对纵向和多维频率估计数的效用 (Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates)

This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under Local Differential Privacy (LDP) guarantees. Contrary to frequency estimation of a single attribute, the multidimensional aspect demands particular attention to the privacy budget. Besides, when collecting user statistics longitudinally, privacy progressively degrades. Indeed, the "multiple" settings in combination (i.e., many attributes and several collections throughout time) impose several challenges, for which this paper proposes the first solution for frequency estimates under LDP. To tackle these issues, we extend the analysis of three state-of-the-art LDP protocols (Generalized Randomized Response -- GRR, Optimized Unary Encoding -- OUE, and Symmetric Unary Encoding -- SUE) for both longitudinal and multidimensional data collections. While the known literature uses OUE and SUE for two rounds of sanitization (a.k.a. memoization), i.e., L-OUE and L-SUE, respectively, we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility (i.e., L-OSUE). Also, for attributes with small domain sizes, we propose Longitudinal GRR (L-GRR), which provides higher utility than the other protocols based on unary encoding. Last, we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates (ALLOMFREE), which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol, i.e., either L-GRR or L-OSUE. As shown in the results, ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.

翻译：本文调查了长期收集多维数据的问题( 纵向研究 ), 在本地差异隐私( LDP) 保障下对频率估算的基本任务收集多维数据的问题。与对单个属性的频率估算不同, 多维方面要求特别关注隐私预算。此外, 在收集用户统计数据时, 隐私在纵向逐渐降低。事实上, “ 多重” 设置的组合( 即, 许多属性和多次收集 ) 带来了若干挑战, 本文为此提出了LDP 下频率估算的第一个解决方案。为了解决这些问题, 我们扩展了对三种最新LDP 协议的分析( 通用的 Randal- GRR, 优化的 Unary 编码 -- OUUUE, 和 Symmal Unitedal Ecoding -- SUE 。已知文献使用OUE 和 SUIFRO 的双回合(a. k. a. a. a. a. a. referal E and L- SUE, 我们分析性地和实验性的 Oral- developal Oral- developal Oral Oal Oral) 预al 和LO 预isal 提供和LULUILU- slal- sal- 和LULULUI- deal- salations mal 和LUI) 和LU 的版本, 和LO- sl 和LVI- supal- slational- supal- supal- sal- sal- slational 和LUI- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal 和s mal 和s mal- sal- 和smmaldal 和smal- 和smaldal- 和smmmal 和s 和s 和s 和s 和s 和s 和S- 和S- sal- sal- sal- sal- sal- 和SUUUUU