The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data that are transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and theories for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval (or, more generally, a range) containing it. The proposed interval privacy mechanisms can be easily deployed through survey-based data collection interfaces, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but do not perturb it. Using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individuals, naturally leading to privacy-adaptive data collection. We develop different aspects of theory such as composition, robustness, distribution estimation, and regression learning from interval-valued data. Interval privacy provides a new perspective of human-centric data privacy where individuals have a perceptible, transparent, and simple way of sharing sensitive data.
翻译:正在形成的公众认识和政府对数据隐私的监管激励着收集和分析数据的新模式,这些数据透明,数据拥有者可以接受。我们提出了在数据收集过程中将数据私有化的隐私和相应数据格式、机制和理论的新概念。隐私称为Interval Pire,对私有化数据的原始数据有条件分配强制执行与非三重支持数据集的无条件分配相同。相应地,拟议的隐私机制将记录每个数据价值,作为随机间隔(或更一般地说,一个范围),包含这些数据。拟议的间隔隐私机制可以通过基于调查的数据收集界面很容易地部署,例如,询问被调查者其数据价值是否属于随机产生的范围。间隔机制的另一个独特特点是,它们模糊真相,但并不干扰它。使用较窄的范围来传递信息是对受扰动数据流行的范式的补充。此外,间隔机制可以产生个人自由裁量的逐步完善的信息,自然导致隐私适应性数据收集。我们开发了不同理论的各个方面,例如构成、稳健性、分布状况、敏感度、敏感度的隐私估计以及从隐私角度对数据进行深度分析,从而从数据进行深度评估。