通过Bayesian数据综合资料保护数据隐私和公用事业保护:关于Airbnb清单的个案研究 (Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings)

When releasing record-level data containing sensitive information to the public, the data disseminator is responsible for protecting the privacy of every record in the dataset, simultaneously preserving important features of the data for users' analysis. These goals can be achieved by data synthesis, where confidential data are replaced with synthetic data that are simulated based on statistical models estimated on the confidential data. In this paper, we present a data synthesis case study, where synthetic values of price and the number of available days in a sample of the New York Airbnb Open Data are created for privacy protection. One sensitive variable, the number of available days of an Airbnb listing, has a large amount of zero-valued records and also truncated at the two ends. We propose a novel zero-inflated truncated Poisson regression model for its synthesis. We utilize a sequential synthesis approach to further synthesize the sensitive price variable. The resulting synthetic data are evaluated for its utility preservation and privacy protection, the latter in the form of disclosure risks. Furthermore, we propose methods to investigate how uncertainties in intruder's knowledge would influence the identification disclosure risks of the synthetic data. In particular, we explore several realistic scenarios of uncertainties in intruder's knowledge of available information and evaluate their impacts on the resulting identification disclosure risks.

翻译：在向公众公布含有敏感信息的记录级数据时,数据传播器负责保护数据集中每个记录的隐私,同时保留数据的重要特征;这些目标可以通过数据合成来实现,其中机密数据被根据根据对机密数据的估计统计模型模拟的合成数据所取代;在本文件中,我们提出数据综合案例研究,其中为隐私保护创建了价格合成值和纽约空气开放数据样本中可用天数样本;一个敏感变量,即Airbnb列表的可用天数数,有大量零价值记录,并在两个端进行截断;我们提出一个新的零增缩波瓦松回归模型,供综合使用;我们采用顺序合成方法进一步综合敏感的价格变量;对由此产生的合成数据进行评估,以维护其效用和保护隐私,后者以披露风险的形式;此外,我们提出一些方法,以调查入侵者知识的不确定性如何影响合成数据识别披露风险的识别风险;特别是,我们探讨其现有信息披露的不确定性。

相关内容

Airbnb

关注 5

Airbnb https://zh.airbnb.com/?af=83334047 成立于 2008 年 8 月，总部位于加利福尼亚州旧金山市。Airbnb 是一个值得信赖的社区型市场，在这里人们可以通过网站、手机或平板电脑发布、发掘和预订世界各地的独特房源。无论是想在公寓里住一个晚上，或在城堡里呆一个星期，又或在别墅住上一个月，都能以任何价位享受到 Airbnb 在全球 191 个国家的 34,000 多个城市为你带来的独一无二的住宿体验。

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

90+阅读 · 2019年12月15日