Population synthesis consists of generating synthetic but realistic representations of a target population of micro-agents for the purpose of behavioral modeling and simulation. We introduce a new framework based on copulas to generate synthetic data for a target population of which only the empirical marginal distributions are known by using a sample from another population sharing similar marginal dependencies. This makes it possible to include a spatial component in the generation of population synthesis and to combine various sources of information to obtain more realistic population generators. Specifically, we normalize the data and treat them as realizations of a given copula, and train a generative model on the normalized data before injecting the information on the marginals. We compare the copulas framework to IPF and to modern probabilistic approaches such as Bayesian networks, variational auto-encoders, and generative adversarial networks. We also illustrate on American Community Survey data that the method proposed allows to study the structure of the data at different geographical levels in a way that is robust to the peculiarities of the marginal distributions.
翻译:人口合成包括合成但现实的微试剂目标人群,以行为模型和模拟为目的; 我们采用一个新的框架,以焦云为基础,为目标人群生成合成数据,而只有使用其他具有类似边际依赖性的人口样本,才知道其实证边际分布; 从而有可能将空间部分纳入人口合成的生成过程,并整合各种信息来源,以获取更现实的人口生成器; 具体地说,我们使数据正常化,将其作为特定焦云的实现,并在注入边缘地区信息之前,对正常数据进行染色模型培训; 我们将焦云框架与森林小组和现代概率方法进行比较,例如巴耶斯网络、变式自动编码器和基因对抗网络; 我们还在美洲社区调查数据中说明,所提议的方法允许研究不同地理层次的数据结构,以适应边缘分布的特性。