As the COVID-19 ravaging through the globe, accurate forecasts of the disease spread is crucial for situational awareness, resource allocation, and public health decision-making. Alternative to the traditional disease surveillance data collected by the United States (US) Centers for Disease Control and Prevention (CDC), big data from Internet such as online search volumes has been previously shown to contain valuable information for tracking infectious disease dynamics. In this study, we evaluate the feasibility of using Internet search volume of relevant queries to track and predict COVID-19 pandemic. We found strong association between COVID-19 death trend and the search volume of symptom-related queries such as "loss of taste". Then, we further develop an influenza-tracking model to predict future 2-week COVID-19 deaths on the US national level, by combining search volume information with COVID-19 time series information. Encouraged by the 45% error reduction on national level comparing to the baseline time series model, we additionally build state-level COVID-19 deaths models, leveraging the cross-state cross-resolution spatial temporal framework that pools information from search volume and COVID-19 reports across states, regions and the nation. These variants of ARGOX are then aggregated in a winner-takes-all ensemble fashion to produce the final state-level 2-week forecasts. Numerical experiments demonstrate that our method steadily outperforms time series baseline models, and achieves the state-of-the-art performance among the publicly available benchmark models. Overall, we show that disease dynamics and relevant public search behaviors co-evolve during the COVID-19 pandemic, and capturing their dependencies while leveraging historical cases/deaths as well as spatial-temporal cross-region information will enable stable and accurate US national and state-level forecasts.
翻译:由于COVID-19在全球肆虐,对疾病传播的准确预测对局势意识、资源分配和公共卫生决策至关重要。除了美国疾病控制和预防中心(CDC)收集的传统疾病监测数据之外,互联网上的大数据,如在线搜索数量,以前显示含有跟踪传染病动态的宝贵信息。在这项研究中,我们评估了使用因特网搜索大量相关查询来跟踪和预测COVID-19大流行的可行性。我们发现COVID-19死亡趋势与“品味损失”等症状相关查询数量的搜索密切相关。随后,我们进一步开发了流感跟踪模型,以预测未来两周COVID-19美国国家一级的死亡情况,将搜索数量信息与COVID-19时间序列的时间序列信息合并起来,以跟踪追踪传染病动态动态动态动态。由于国家一级45%的误差与基线时间序列模型相比,我们进一步建立了州级COVID-19死亡模型,利用跨州空间时间框架将搜索数量和COVID-19案例的搜索数据汇集到各州、各区域和国家级的COVD-19系列案例搜索数据,从而展示了我们州、州、州和州际的SOral-ral-ral-ral-ral-ral-ral-ral-sal-ral-ral-ral-ral-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-s-l-l-s-s-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l