When releasing record-level data containing sensitive information to the public, the data disseminator is responsible for protecting the privacy of every record in the dataset, simultaneously preserving important features of the data for users' analyses. These goals can be achieved by data synthesis, where confidential data are replaced with synthetic data that are simulated based on statistical models estimated on the confidential data. In this paper, we present a data synthesis case study, where synthetic values of price and the number of available days in a sample of the New York Airbnb Open Data are created for privacy protection. One sensitive variable, the number of available days of an Airbnb listing, has a large amount of zero-valued records and also truncated at the two ends. We propose a zero-inflated truncated Poisson regression model for its synthesis. We utilize a sequential synthesis approach to further synthesize the sensitive price variable. The resulting synthetic data are evaluated for its utility preservation and privacy protection, the latter in the form of disclosure risks. Furthermore, we propose methods to investigate how uncertainties in intruder's knowledge would influence the identification disclosure risks of the synthetic data. In particular, we explore several realistic scenarios of uncertainties in intruder's knowledge of available information and evaluate their impacts on the resulting identification disclosure risks.
Airbnb https://zh.airbnb.com/?af=83334047 成立于 2008 年 8 月,总部位于加利福尼亚州旧金山市。Airbnb 是一个值得信赖的社区型市场,在这里人们可以通过网站、手机或平板电脑发布、发掘和预订世界各地的独特房源。无论是想在公寓里住一个晚上,或在城堡里呆一个星期,又或在别墅住上一个月,都能以任何价位享受到 Airbnb 在全球 191 个国家的 34,000 多个城市为你带来的独一无二的住宿体验。