This paper investigates the problem of collecting multidimensional data throughout time (i.e., longitudinal studies) for the fundamental task of frequency estimation under local differential privacy (LDP). Contrary to frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. Besides, when collecting user statistics longitudinally, privacy progressively degrades. Indeed, both "multiple" settings combined (i.e., many attributes and several collections throughout time) imposes several challenges, in which this paper proposes the first solution for frequency estimates under LDP. To tackle these issues, we extend the analysis of three state-of-the-art LDP protocols (Generalized Randomized Response -- GRR, Optimized Unary Encoding -- OUE, and Symmetric Unary Encoding -- SUE) for both longitudinal and multidimensional data collections. While the known literature uses OUE and SUE for two rounds of sanitization (a.k.a. memoization), i.e., L-OUE and L-SUE, respectively, we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility (i.e., L-OSUE). Also, for attributes with small domain sizes, we propose longitudinal GRR (L-GRR), which provides higher utility than the other protocols based on unary encoding. Lastly, we also propose a new solution named \underline{A}daptive \underline{L}DP for \underline{LO}ngitudinal and \underline{M}ultidimensional \underline{FRE}quency \underline{E}stimates (ALLOMFREE), which randomly samples a single attribute to send with the whole privacy budget and adaptively selects the optimal protocol, i.e., either L-GRR or L-OSUE. As shown in the results, ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimations.
翻译:本文调查了在本地差异隐私(LDP)下对频率估算基本任务长期收集多维数据的问题( 即 { 直线 { 直线研究 { 直线研究 ) 。 与对单个属性( 大部分作品) 的频率估计相反, 该多维方面要求特别关注隐私预算。 此外, 在收集用户统计数据时, 隐私会逐渐降低。 事实上, 两种“ 多重” 设置( 即, 许多属性 ) 和多个收藏 ) 都带来了一些挑战, 其中本文件为 LDP 的频率估算提出了第一个解决方案 。 为了解决这些问题, 我们扩展了三种最先进的 LDP 协议( 常规随机随机响应 - GRRR, 优化的 Unity Eccoding - Oncrecoding - SUIUE) 。 虽然已知的文献使用 OUE 和 SUE (a. k. F. Refrial) 和 SU- Rireal- requireal), 我们用O- drial- deal- developal 和Oral- develyal- dal- demodeal- demodeal 提供 数据。