COVID-19 has been a public health emergency of international concern since early 2020. Reliable forecasting is critical to diminish the impact of this disease. To date, a large number of different forecasting models have been proposed, mainly including statistical models, compartmental models, and deep learning models. However, due to various uncertain factors across different regions such as economics and government policy, no forecasting model appears to be the best for all scenarios. In this paper, we perform quantitative analysis of COVID-19 forecasting of confirmed cases and deaths across different regions in the United States with different forecasting horizons, and evaluate the relative impacts of the following three dimensions on the predictive performance (improvement and variation) through different evaluation metrics: model selection, hyperparameter tuning, and the length of time series required for training. We find that if a dimension brings about higher performance gains, if not well-tuned, it may also lead to harsher performance penalties. Furthermore, model selection is the dominant factor in determining the predictive performance. It is responsible for both the largest improvement and the largest variation in performance in all prediction tasks across different regions. While practitioners may perform more complicated time series analysis in practice, they should be able to achieve reasonable results if they have adequate insight into key decisions like model selection.
翻译:自2020年初以来,COVID-19一直是引起国际关注的公共卫生紧急情况。可靠的预测对于减少这一疾病的影响至关重要。迄今为止,已经提出了大量不同的预测模型,主要包括统计模型、条块模型和深层学习模型。然而,由于不同区域的各种不确定因素,如经济学和政府政策等,没有预测模型似乎是所有情景的最佳假设。在本文件中,我们对美国不同预测地平线的不同区域已确认的病例和死亡的预测进行定量分析,并评估以下三个层面对预测性业绩(改进和变异)的相对影响:模型选择、超光谱调整以及培训所需的时间序列。我们发现,如果一个层面带来更高的绩效收益,如果不加以适当调整,还可能导致更严厉的绩效处罚。此外,模型选择是确定预测性绩效的主导因素。它负责不同区域所有预测性任务中最大程度的改进和最大程度的绩效差异。虽然从业人员在实践中可能进行更复杂的时间序列分析,但如能取得合理的深入分析,他们应能取得合理的结果。