Background. Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods. In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results. Statistical measures i.e., Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used for model accuracy. The values of MAPE for the best selected models for confirmed, recovered and deaths cases are 0.407, 0.094 and 0.124 respectively, which falls under the category of highly accurate forecasts. In addition, we computed fifteen days ahead forecast for the daily deaths, recover, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting decision making of evolving short term policies.
翻译:预测即将到来的大流行病的时间,通过采取公共卫生信息等预防性步骤,提高医生的认识,减少疾病的影响;随着COVID-19-19累计发病率持续迅速增加,研究界正在使用统计和爆发预测模型,包括各种机器学习模型,以跟踪和预测该流行病的趋势,并制订适当的战略来防治和管理其传播。方法。在本文件中,我们比较分析各种ML方法,包括支助病媒机、随机森林、K-Nearest 运算和人工神经网络等预防性步骤,以预测流行病学领域的COVID-19爆发。随着COVID-19累积发生率的不断迅速增加,我们首先采用自动递增分布滞后(ARDL)模型,以确定和模拟时间系列COVID-19数据集的短期和长期关系。也就是说,我们确定反应变量与各自的解释性时间序列变量之间的滞后。然后,由ARDL所选择的回归模型,用于预测和预测流行病周期周期内COVID-19爆发的周期内CO-19的周期性疾病爆发。 用于估算结果的统计结果,用于MADR的准确性案例。