The idea to generate synthetic data as a tool for broadening access to sensitive microdata has been proposed for the first time three decades ago. While first applications of the idea emerged around the turn of the century, the approach really gained momentum over the last ten years, stimulated at least in parts by some recent developments in computer science. We consider the upcoming 30th jubilee of Rubin's seminal paper on synthetic data (Rubin, 1993) as an opportunity to look back at the historical developments, but also to offer a review of the diverse approaches and methodological underpinnings proposed over the years. We will also discuss the various strategies that have been suggested to measure the utility and remaining risk of disclosure of the generated data.
翻译:摘要:三十年前,合成数据作为扩大敏感微观数据访问的工具首次被提出。虽然这个想法在本世纪之交左右出现了最早的应用,但这种方法在过去的十年中真正加速,至少部分原因是受到了计算机科学的一些最近发展的刺激。我们认为,对于合成数据的历史发展,我们对于在过去的几十年中提出的各种方法进行综述的比较合适。我们还将讨论各种已被建议的策略,以测量所产生的数据的实用性和泄露风险。值得一提的是,文章中如有专有名词需要标注英文。