With their continued increase in coverage and quality, data collected from personal air quality monitors has become an increasingly valuable tool to complement existing public health monitoring systems over urban areas. However, the potential of using such `citizen science data' for automatic early warning systems is hampered by the lack of models able to capture the high resolution, nonlinear spatio-temporal features stemming from local emission sources such as traffic, residential heating and commercial activities. In this work, we propose a machine learning approach to forecast high frequency spatial fields which has two distinctive advantages from standard neural network methods in time: 1) sparsity of the neural network via a spike-and-slab prior, and 2) a small parametric space. The introduction of stochastic neural networks generates additional uncertainty, and in this work we propose a fast approach for ensure that the forecast is correctly assessed (calibration), both marginally and spatially. We focus on assessing exposure to urban air pollution in San Francisco, and our results suggest an improvement of over 58% in the mean squared error over standard time series approach with a calibrated forecast for up to 5 days.
翻译:由个人空气质量监测器收集的数据在覆盖范围和质量方面不断提高,已成为补充城市地区现有公共卫生监测系统的一个越来越宝贵的工具,然而,由于缺少能够捕捉高分辨率、非线性西班牙-时空特征的模型,如交通、住宅供暖和商业活动等当地排放源,利用这种“公民科学数据”来自动预警系统的潜力受到阻碍。在这项工作中,我们建议采用机器学习方法来预测高频空间域,这种空间域与标准神经网络方法在时间上具有两个明显的优势:(1) 神经网络通过前悬浮和悬浮而变得宽广,(2) 一个小的准空间。引入随机神经网络会产生更多的不确定性,在这项工作中,我们提出一种快速办法,以确保对预测进行正确的评估(校准),包括边际和空间两方面。我们着重评估旧金山城市空气污染的暴露情况,我们的结果表明,在标准时间序列方法上,平均平方差差超过58%,并有5天的校准预报。