With the proliferation of mobile devices, an increasing amount of population data is being collected, and there is growing demand to use the large-scale, multidimensional data in real-world situations. We introduced functional data analysis (FDA) into the problem of predicting the hourly population of different districts of Tokyo. FDA is a methodology that treats and analyzes longitudinal data as curves, which reduces the number of parameters and makes it easier to handle high-dimensional data. Specifically, by assuming a Gaussian process, we avoided the large covariance matrix parameters of the multivariate normal distribution. In addition, the data were time and spatially dependent between districts. To capture these characteristics, a Bayesian factor model was introduced, which modeled the time series of a small number of common factors and expressed the spatial structure in terms of factor loading matrices. Furthermore, the factor loading matrices were made identifiable and sparse to ensure the interpretability of the model. We also proposed a method for selecting factors using the Bayesian shrinkage method. We studied the forecast accuracy and interpretability of the proposed method through numerical experiments and data analysis. We found that the flexibility of our proposed method could be extended to reflect further time series features, which contributed to the accuracy.
翻译:随着移动设备的扩散,正在收集越来越多的人口数据,而且人们日益要求在现实世界局势中使用大规模、多层面的数据。我们把功能性数据分析(FDA)引入了预测东京不同地区每小时人口的问题。林业发展局是一个将纵向数据作为曲线处理和分析的方法,它减少了参数数量,使处理高维数据更加容易。具体地说,我们假设一个高斯进程,就避免了多变量正常分布的庞大共变矩阵参数。此外,数据在时间和空间上依附于不同地区。为了捕捉这些特点,我们采用了一种贝叶斯因素模型,该模型模拟了少量共同因素的时间序列,并以要素装载矩阵的形式表示空间结构。此外,要素装载矩阵被确定和分散,以确保模型的可解释性。我们还提出了一个使用贝叶斯缩微法选择因素的方法。我们研究了通过数字实验和数据分析对拟议方法的预测准确性和可解释性。我们发现,我们拟议方法的灵活性可以进一步反映时间序列。我们提出的方法的精确性有助于进一步反映时间序列。我们发现,我们提出的方法的精确性有助于进一步反映时间特性。