Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks (ANNs). Most of the prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone, oxides of nitrogen, and PM2.5. Given that traditional, highly sophisticated air quality monitors are expensive and are not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built on physical measurement data collected from sensors, they may not be suitable for predicting public health effects experienced from pollution exposure. This study aims to develop and validate models to nowcast the observed pollution levels using Web search data, which is publicly available in near real-time from major search engines. We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level, by using generally available meteorological data and aggregate Web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting three critical air pollutants (ozone (O3), nitrogen dioxide (NO2), and fine particulate matter (PM2.5)), across ten major U.S. metropolitan statistical areas (MSAs) in 2017 and 2018.
翻译:近些年来,利用人工神经网络进行的空气污染预测和监测研究急剧增加,其中多数先前的工作依靠地面监测器和气象数据收集的污染物浓度模型,以长期预报室外臭氧、氮氧化物和PM2.5。鉴于传统的、高度尖端的空气质量监测器费用昂贵,而且不普遍提供,这些模型无法充分服务那些不在污染物监测点附近生活的人。此外,由于先前的模型是建立在从传感器收集的物理测量数据基础上,因此它们可能不适合预测污染暴露给公众健康带来的影响。这项研究的目的是开发和验证模型,以便利用万维网搜索数据(从主要搜索引擎近实时可公开获得的)对观测到的污染水平进行现在的观测。我们开发了新的机器学习模型,利用传统的受监督分类方法和最先进的深层学习方法,以探测美国城市一级空气污染水平升高的情况,方法是利用一般可得的气象数据以及从谷歌趋势得出的基于网络的总量搜索数据。我们通过预测三个关键大气污染物(O-S-BARMS),对这些方法的绩效进行了精确的验证。