The explosion of data collection has raised serious privacy concerns in users due to the possibility that sharing data may also reveal sensitive information. The main goal of a privacy-preserving mechanism is to prevent a malicious third party from inferring sensitive information while keeping the shared data useful. In this paper, we study this problem in the context of time series data and smart meters (SMs) power consumption measurements in particular. Although Mutual Information (MI) between private and released variables has been used as a common information-theoretic privacy measure, it fails to capture the causal time dependencies present in the power consumption time series data. To overcome this limitation, we introduce the Directed Information (DI) as a more meaningful measure of privacy in the considered setting and propose a novel loss function. The optimization is then performed using an adversarial framework where two Recurrent Neural Networks (RNNs), referred to as the releaser and the adversary, are trained with opposite goals. Our empirical studies on real-world data sets from SMs measurements in the worst-case scenario where an attacker has access to all the training data set used by the releaser, validate the proposed method and show the existing trade-offs between privacy and utility.
翻译:数据收集的爆炸引起了用户对隐私的严重关切,因为共享数据也可能暴露敏感信息; 隐私保护机制的主要目标是防止恶意第三方在使用共享数据的同时推断敏感信息,防止恶意第三方在使用共享数据的同时推断敏感信息; 在本文件中,我们特别在时间序列数据和智能仪(SMs)消费量度范围内研究这一问题; 虽然私人变量和释放变量之间的相互信息(MI)已被作为一种共同的信息理论隐私措施使用,但它未能捕捉电力消费时间序列数据中存在的因果关系; 为了克服这一限制,我们引入了直接信息(DI),作为考虑的设定中更为有意义的隐私度量度,并提出新的损失功能; 然后,优化使用对抗性框架进行,即两个称为释放器和对手的常规神经网络(RNNS)受到相反目标的培训; 我们从SM测量中得出的真实世界数据集的经验研究,在最坏的情况下,攻击者可以查阅释放器使用的所有培训数据集,验证拟议的方法,并展示现有隐私和效用之间的交易。