Aggregate time-series data like traffic flow and site occupancy repeatedly sample statistics from a population across time. Such data can be profoundly useful for understanding trends within a given population, but also pose a significant privacy risk, potentially revealing e.g., who spends time where. Producing a private version of a time-series satisfying the standard definition of Differential Privacy (DP) is challenging due to the large influence a single participant can have on the sequence: if an individual can contribute to each time step, the amount of additive noise needed to satisfy privacy increases linearly with the number of time steps sampled. As such, if a signal spans a long duration or is oversampled, an excessive amount of noise must be added, drowning out underlying trends. However, in many applications an individual realistically cannot participate at every time step. When this is the case, we observe that the influence of a single participant (sensitivity) can be reduced by subsampling and/or filtering in time, while still meeting privacy requirements. Using a novel analysis, we show this significant reduction in sensitivity and propose a corresponding class of privacy mechanisms. We demonstrate the utility benefits of these techniques empirically with real-world and synthetic time-series data.
翻译:总体时间序列数据,如交通流量和站点占用情况,从不同时间的人群中反复抽样统计。这些数据对于了解特定人口内部的趋势可能非常有益,但也可能构成重大的隐私风险,例如,谁花时间在某个特定人群中,从而可能暴露出巨大的隐私风险。 制作符合差异隐私标准定义的时间序列的私人版本具有挑战性,因为单个参与者可对序列产生巨大影响:如果一个人能够对每个时间步骤作出贡献,那么,用抽样的时间步骤的数量来直线地满足隐私需要的添加噪音的数量就会增加。因此,如果信号持续时间长或被过度取样,那么必须添加过多的噪音,从而淹没基本趋势。然而,在许多应用中,个人实际上无法在每一个阶段都参与。如果是这样的话,我们观察到单个参与者(敏感度)的影响力可以通过子抽样和/或过滤来降低,同时满足隐私要求。我们通过新颖的分析,展示了敏感度的这种大幅度下降,并提出相应的隐私机制。我们用实际和合成时间序列来展示这些技术的实用性效益。