The abundance of data collected by sensors in Internet of Things (IoT) devices, and the success of deep neural networks in uncovering hidden patterns in time series data have led to mounting privacy concerns. This is because private and sensitive information can be potentially learned from sensor data by applications that have access to this data. In this paper, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data such that intrusive inferences are prevented while desired inferences can still be made with sufficient accuracy. In the deterministic case, we use a linear transformation to move the representation of input data in the latent space such that the reconstructed data is likely to have the same public attribute but a different private attribute than the original input data. In the probabilistic case, we apply the linear transformation to the latent representation of input data with some probability. We compare our technique with autoencoder-based anonymization techniques and additionally show that it can anonymize data in real time on resource-constrained edge devices.
翻译:互联网事物(IoT)设备传感器收集的大量数据,以及深神经网络在发现时间序列数据中隐藏模式方面成功发现大量数据,都引起了越来越多的隐私问题。这是因为私人和敏感信息有可能通过能够访问这些数据的应用从传感器数据中学习。在本文件中,我们的目标是通过学习对数据混淆有用的低维表示来审查公用和隐私损失之间的权衡。我们提议在变异自动编码器的潜在空间中进行确定性和概率的转换,以合成时间序列数据,从而防止侵扰性推断,而期望的推断仍然可以足够精确地进行。在确定性的情况下,我们使用线性转换来移动潜在空间中输入数据的表示方式,使重建的数据具有相同的公共属性,但与原始输入数据不同的私人属性。在概率学中,我们用线性转换来对输入数据的潜在表示某种概率。我们将我们的技术与基于自动编码的匿名技术进行比较,同时以足够准确的方式作出推断。在确定性的情况下,我们使用线性转换来移动输入数据,从而在潜在空间中显示它能够对实际资源稳定地进行数据。