Forecasting the particulate matter (PM) concentration in South Korea has become urgently necessary owing to its strong negative impact on human life. In most statistical or machine learning methods, independent and identically distributed data, for example, a Gaussian distribution, are assumed; however, time series such as air pollution and weather data do not meet this assumption. In this study, the maximum correntropy criterion for regression (MCCR) loss is used in an analysis of the statistical characteristics of air pollution and weather data. Rigorous seasonality adjustment of the air pollution and weather data was performed because of their complex seasonality patterns and the heavy-tailed distribution of data even after deseasonalization. The MCCR loss was applied to multiple models including conventional statistical models and state-of-the-art machine learning models. The results show that the MCCR loss is more appropriate than the conventional mean squared error loss for forecasting extreme values.
翻译:预测韩国颗粒物浓度(PM)由于对人类生命的强烈负面影响而变得迫切需要,在大多数统计或机器学习方法中,假定了独立和同样分布的数据,例如高山分布;然而,空气污染和天气数据等时间序列不符合这一假设,在这项研究中,利用回归损失的最大可转性标准来分析空气污染和天气数据的统计特征。空气污染和天气数据的严格季节性调整是由于其复杂的季节性模式和即使在淡季化后数据也大量散散发。中子辐射损失被用于多种模型,包括传统统计模型和最新机器学习模型。结果显示,中子辐射损失比常规平均平方错误损失更适合预测极端价值。