In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.
翻译:在这项工作中,我们评估了人口模型和机器学习模型组合的可适用性,以预测COVID-19大流行的近期未来演变情况,西班牙的情况尤为特殊。我们完全依靠开放和公共数据集、发热率、接种疫苗、人类流动和天气数据来充实我们的机器学习模型(兰多姆森林、大动脉、K-Nearest邻居和Kernel Ridge Regrestion)。我们利用事件数据来调整典型的人口模型(贡佩茨、物流、Richards、Bertalanffy),以便能够更好地捕捉数据趋势。然后,我们把这些模型组合起来,以便获得更有力和准确的预测。此外,我们观察到,随着我们增加新的特征(真空、流动性、气候条件),分析每个模型的重要性,我们用这些模型来分析典型的重要性,使用的数据和预测质量有几个局限性,因此必须从一个批评的角度来看待,因为我们在文本中讨论的是,我们用机器学习模型获得的预测,我们的工作结论是,在使用这些模型时只能用这些模型来改进。我们的工作结论是,在学习模型时,在使用这种模型时,只有使用这种模型时才能改进。我们使用。我们的工作结论认为,在使用这些模型的模型的模型中只能用来改进。