空间和高度多层空气污染数据空间矩阵完成 (Spatial Matrix Completion for Spatially-Misaligned and High-Dimensional Air Pollution Data)

from arxiv, 26 pages, 5 figures, 5 tables, 1 supplemental file (available upon request). This v2 is a pre peer-reviewed version that was submitted to Environmetrics. A final version with minor revisions was accepted for publication by Environmetrics on Dec 13, 2021, and will be linked to this version once published

In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data, principal component analysis (PCA) is often incorporated to obtain low-rank (LR) structure of the data prior to spatial prediction. Recently developed predictive PCA modifies the traditional algorithm to improve the overall predictive performance by leveraging both LR and spatial structures within the data. However, predictive PCA requires complete data or an initial imputation step. Nonparametric imputation techniques without accounting for spatial information may distort the underlying structure of the data, and thus further reduce the predictive performance. We propose a convex optimization problem inspired by the LR matrix completion framework and develop a proximal algorithm to solve it. Missing data are imputed and handled concurrently within the algorithm, which eliminates the necessity of a separate imputation step. We show that our algorithm has low computational burden and leads to reliable predictive performance as the severity of missing data increases.

翻译：在健康污染组别研究中,需要准确预测新地点的污染物浓度,因为固定监测地点和研究参与者的地点往往在空间上不对称。对于多污染数据,主要组成部分分析(PCA)通常在空间预测之前就被纳入,以获得低水平的数据结构。最近开发的预测五氯苯甲醚对传统算法进行了修改,通过利用数据中的远距离和空间结构来提高总体预测性能。然而,预测五氯苯甲醚需要完整的数据或初步估算步骤。不考虑空间信息的非参数估算法可能会扭曲数据的基本结构,从而进一步降低预测性能。我们提出受LR矩阵完成框架启发的松动优化问题,并开发一种准算法来解决这一问题。缺失的数据是估算和在算法中同时处理的,这就消除了单独估算性步骤的必要性。我们表明,我们的算法的计算负担较低,随着缺失数据的严重程度增加,导致可靠的预测性能。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日