The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and theories for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval (or, more generally, a range) containing it. The proposed interval privacy mechanisms can be easily deployed through survey-based data collection interfaces, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not perturb it. Using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individuals, naturally leading to privacy-adaptive data collection. We develop different aspects of theory such as composition, robustness, distribution estimation, and regression learning from interval-valued data. Interval privacy provides a new perspective of human-centric data privacy where individuals have a perceptible, transparent, and simple way of sharing sensitive data.
翻译:正在形成的公众认识和政府对数据隐私的监管激励了收集和分析数据透明和数据拥有者可接受的新模式。我们提出了在数据收集过程中将数据私有化的隐私和相应数据格式、机制和理论的新概念。隐私称为Interval Preire,强制采用私有化数据附带条件的原始数据分配方式,使其与非三重支持数据集的无条件分布方式相同。相应的是,拟议的隐私机制将记录每个数据价值,作为随机间隔(或更一般而言,一个范围),包含这些数据。拟议的间隔隐私机制可以通过基于调查的数据收集接口,例如询问被调查者其数据价值是否在随机生成的范围之内,从而很容易地部署。间隔机制的另一个独特特征是,它们模糊了真相,但不会干扰它。使用更窄的范围来传递信息是对受扰动数据的流行模式的补充。此外,间隔机制可以产生个人自由裁量的逐步完善的信息,自然导致对隐私的适应性数据收集。我们开发了不同的理论,例如构成、稳健性、分布估计、对分布的保密性以及从敏感的隐私角度从人类的敏感度、透视感性、对等的隐私进行透明分析。