The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and tradeoffs for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval containing it. The proposed interval privacy mechanisms can be easily deployed through most existing survey-based data collection paradigms, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not distort it. The way of using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individual respondents. We study different theoretical aspects of the proposed privacy. In the context of supervised learning, we also offer a method such that existing supervised learning algorithms designed for point-valued data could be directly applied to learning from interval-valued data.
翻译:正在形成的公众认识和政府对数据隐私的监管激励了收集和分析数据透明且为数据拥有者所接受的新模式,我们提出了在数据收集过程中将数据私有化的隐私和相应数据格式、机制和权衡的新概念。隐私称为Interval Pire,强制对私有化数据进行原始数据有条件的发布,使其与非三重支持数据集的无条件发布相同。相应的是,拟议的隐私机制将记录每个数据价值,作为随机间隔。提议的间隙隐私机制可以通过大多数现有的基于调查的数据收集模式,例如询问被调查者其数据价值是否在随机生成的范围之内,从而很容易地加以应用。另一个独特的间隔机制特征是,即它们模糊真相,而不是扭曲真相。使用缩小范围来传递信息的方式是对流行的数据扰动模式的补充。此外,拟议隐私机制可以由个别受访者酌情生成逐步完善的信息。我们研究了拟议隐私的不同理论方面。在监督的学习中,我们还提供了一种方法,即现有的用于点定值数据估值的受监督算法可以直接用于学习。