Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents' privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80\% even under re-identification attempts.
翻译:世界各地的住户调查方案公布细微的地理参照微观数据,以支持关于人类生计及其周围环境相互依存性的研究; 为了保护被调查者的隐私,通过删除或干扰程序,例如混淆数据收集的真实位置,通常(假冒)将微观调查数据匿名化。然而,这对新出现的方法提出了挑战,这些方法在当地一级以辅助信息增加调查数据。在这里,我们提出一个替代性微观数据传播战略,利用原始微观数据的效用,通过使用基因模型合成生成的数据,进一步保障隐私。我们用2011年哥斯达黎加人口普查的数据和卫星衍生的辅助信息进行实验来支持我们的建议。我们的战略降低了被调查者对在60-80 ⁇ 之前披露的任何属性的重新识别风险,即使是在重新识别尝试中。