Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Beijing and various kinds of spatial datasets from New York City and Chicago.
翻译:综合数据通常出现在社会经济和公共安全等不同领域。 汇总数据不与点相关,而是与支持( 如城市的空间区域)相关。 由于支持可能根据属性( 如贫困率和犯罪率)而具有不同的颗粒性, 建模这些数据并非直截了当。 文章提供了一个多输出高萨进程( MoGP) 模型, 该模型使用各个颗粒的多个综合数据集来推断属性的功能。 在拟议的模型中, 每个属性的功能都假设是一个依赖的GP模型, 以独立潜伏GPs为线性混合模型。 我们设计了一个观测模型, 并有一个集成过程; 这一过程是GP相对于相应支持的不同颗粒性( 如贫困率和犯罪率)的不同颗粒性( 如贫困率和犯罪率 ), 我们还引入了先前的混合权重分布, 通过分享前一个模型, 从而将知识传输到多个区域( 如城市 ) 。 在这种模型中, 一个城市的空间汇总数据集过于精确, 拟议的模型仍然可以精确地预测通过使用一个总体数据模型对每个属性进行预测,, 数据库中的某个城市的精确度数据 。