新型冠状病毒病例的可解释混合预测模型：利用自回归模型与LSTM (An Interpretable Hybrid Predictive Model of COVID-19 Cases using Autoregressive Model and LSTM)

The Coronavirus Disease 2019 (COVID-19) has a profound impact on global health and economy, making it crucial to build accurate and interpretable data-driven predictive models for COVID-19 cases to improve policy making. The extremely large scale of the pandemic and the intrinsically changing transmission characteristics pose great challenges for effective COVID-19 case prediction. To address this challenge, we propose a novel hybrid model in which the interpretability of the Autoregressive model (AR) and the predictive power of the long short-term memory neural networks (LSTM) join forces. The proposed hybrid model is formalized as a neural network with an architecture that connects two composing model blocks, of which the relative contribution is decided data-adaptively in the training procedure. We demonstrate the favorable performance of the hybrid model over its two component models as well as other popular predictive models through comprehensive numerical studies on two data sources under multiple evaluation metrics. Specifically, in county-level data of 8 California counties, our hybrid model achieves 4.173% MAPE on average, outperforming the composing AR (5.629%) and LSTM (4.934%). In country-level datasets, our hybrid model outperforms the widely-used predictive models - AR, LSTM, SVM, Gradient Boosting, and Random Forest - in predicting COVID-19 cases in 8 countries around the world. In addition, we illustrate the interpretability of our proposed hybrid model, a key feature not shared by most black-box predictive models for COVID-19 cases. Our study provides a new and promising direction for building effective and interpretable data-driven models, which could have significant implications for public health policy making and control of the current and potential future pandemics.

翻译：新型冠状病毒病（COVID-19）对全球卫生和经济产生了深远的影响，因此建立精准且可解释的基于数据的COVID-19病例预测模型来改进政策制定变得至关重要。疫情的极大规模和内在的传播特性变化使得有效的COVID-19病例预测面临巨大挑战。为了解决这一问题，我们提出了一种新型混合模型，将自回归模型（AR）的可解释性和长短期记忆神经网络（LSTM）的预测能力进行结合。所提出的混合模型被规范为一个神经网络，其架构连接两个组成模型块，其中相对贡献取决于在训练过程中的数据自适应性决策。我们通过对两个数据源在多个评估指标下的综合数值研究，展示了混合模型相对于其两个组成模型以及其他流行的预测模型的良好性能。具体而言，在加州的8个县级数据中，我们的混合模型平均实现4.173%的MAPE（平均绝对百分比误差），超过组成的AR（5.629%）和LSTM（4.934%）。在全球8个国家的国家级数据中，我们的混合模型优于广泛使用的预测模型-AR、LSTM、SVM、Gradient Boosting和Random Forest-预测COVID-19病例。此外，我们还展示了我们提出的混合模型的可解释性，这是绝大多数COVID-19病例黑匣子预测模型所没有的关键特性。我们的研究为构建有效且可解释的数据驱动模型提供了新的有前景的方向，这可能对公共卫生政策制定和当前及未来潜在疫情的控制产生重要影响。