模拟大气数据和识别动态:利用时间数据构建空气污染物模型 (Modeling Atmospheric Data and Identifying Dynamics: Temporal Data-Driven Modeling of Air Pollutants)

Atmospheric modeling has recently experienced a surge with the advent of deep learning. Most of these models, however, predict concentrations of pollutants following a data-driven approach in which the physical laws that govern their behaviors and relationships remain hidden. With the aid of real-world air quality data collected hourly in different stations throughout Madrid, we present an empirical approach using data-driven techniques with the following goals: (1) Find parsimonious systems of ordinary differential equations via sparse identification of nonlinear dynamics (SINDy) that model the concentration of pollutants and their changes over time; (2) assess the performance and limitations of our models using stability analysis; (3) reconstruct the time series of chemical pollutants not measured in certain stations using delay coordinate embedding results. Our results show that Akaike's Information Criterion can work well in conjunction with best subset regression as to find an equilibrium between sparsity and goodness of fit. We also find that, due to the complexity of the chemical system under study, identifying the dynamics of this system over longer periods of time require higher levels of data filtering and smoothing. Stability analysis for the reconstructed ordinary differential equations (ODEs) reveals that more than half of the physically relevant critical points are saddle points, suggesting that the system is unstable even under the idealized assumption that all environmental conditions are constant over time.

翻译：最近随着深层学习的到来,大气建模经历了一阵子。然而,大多数这些模型都根据数据驱动的方法预测污染物的浓度,而数据驱动的方法是,规范其行为和关系的物理法则仍然被隐藏起来。在马德里各地不同站点每小时收集真实世界空气质量数据的帮助下,我们提出了一个使用数据驱动技术的经验方法,其目标如下:(1) 通过分散地识别非线性动态(SINDI),以模拟污染物的浓度及其随时间的变化,找出普通差异方程式的偏差系统;(2) 利用稳定分析评估我们模型的性能和局限性;(3) 重建某些站点没有使用延迟协调嵌入结果测量的化学污染物的时间序列。我们的结果显示,Akaike的信息标准可以配合最佳的子回归,从而找到松散和适当性之间的平衡。我们还发现,由于所研究的化学系统的复杂性,确定这一系统在较长时期内的动态需要更高程度的数据过滤和平稳。稳定分析表明,在固定的环境状况下,所有临界点都在假设,在稳定状态下,所有临界点都在稳定状态之下。