适用于Markov 链条的符号系统差异隐私 (Differential Privacy for Symbolic Systems with Application to Markov Chains)

Data-driven systems are gathering increasing amounts of data from users, and sensitive user data requires privacy protections. In some cases, the data gathered is non-numerical or symbolic, and conventional approaches to privacy, e.g., adding noise, do not apply, though such systems still require privacy protections. Accordingly, we present a novel differential privacy framework for protecting trajectories generated by symbolic systems. These trajectories can be represented as words or strings over a finite alphabet. We develop new differential privacy mechanisms that approximate a sensitive word using a random word that is likely to be near it. An offline mechanism is implemented efficiently using a Modified Hamming Distance Automaton to generate whole privatized output words over a finite time horizon. Then, an online mechanism is implemented by taking in a sensitive symbol and generating a randomized output symbol at each timestep. This work is extended to Markov chains to generate differentially private state sequences that a given Markov chain could have produced. Statistical accuracy bounds are developed to quantify the accuracy of these mechanisms, and numerical results validate the accuracy of these techniques for strings of English words.

翻译：由数据驱动的系统正在从用户那里收集越来越多的数据,而敏感的用户数据需要隐私保护。在某些情况下,所收集的数据是非数字性的或象征性的,传统的隐私方法,例如增加噪音,不适用,尽管这些系统仍然需要隐私保护。因此,我们提出了一个保护由符号系统产生的轨迹的新颖的差别隐私框架。这些轨迹可以作为文字或字符串来代表,而不是一个限定的字母。我们开发了新的差异隐私机制,使用可能接近的随机词来接近敏感词。一个离线机制得到了高效实施,使用一个改装的Hamming距离自动马顿在一定的时间范围内生成整个私有化的输出词。然后,一个在线机制通过在每一个时间步骤中采集敏感符号和生成随机化输出符号来实施。这项工作扩大到Markov链,以产生一个特定Markov链本可以产生的差别化的私人状态序列。我们开发了统计准确性约束,以量化这些机制的准确性,并用数字结果验证这些方法对英文字串的准确性。

相关内容

马尔可夫链

关注 289

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日