The COVID-19 pandemic has brought forth the importance of epidemic forecasting for decision makers in multiple domains, ranging from public health to the economy as a whole. While forecasting epidemic progression is frequently conceptualized as being analogous to weather forecasting, however it has some key differences and remains a non-trivial task. The spread of diseases is subject to multiple confounding factors spanning human behavior, pathogen dynamics, weather and environmental conditions. Research interest has been fueled by the increased availability of rich data sources capturing previously unobservable facets and also due to initiatives from government public health and funding agencies. This has resulted, in particular, in a spate of work on 'data-centered' solutions which have shown potential in enhancing our forecasting capabilities by leveraging non-traditional data sources as well as recent innovations in AI and machine learning. This survey delves into various data-driven methodological and practical advancements and introduces a conceptual framework to navigate through them. First, we enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting, capturing various factors like symptomatic online surveys, retail and commerce, mobility, genomics data and more. Next, we discuss methods and modeling paradigms focusing on the recent data-driven statistical and deep-learning based methods as well as on the novel class of hybrid models that combine domain knowledge of mechanistic models with the effectiveness and flexibility of statistical approaches. We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems including decision-making informed by forecasts. Finally, we highlight some challenges and open problems found across the forecasting pipeline.
翻译:COVID-19大流行使从公共卫生到整个经济等多个领域的决策者认识到流行病预报的重要性。虽然预测流行病进展往往被概念化地认为类似于天气预报,但有一些关键差异,仍然是非三重任务。疾病传播受多重复杂因素的影响,这些因素包括人类行为、病原体动态、天气和环境条件等。研究的兴趣因以下因素而得到增强:从以前无法观察的方面收集的丰富的数据来源增加,以及政府公共卫生和供资机构的倡议。这特别导致大量关于“以数据为中心的”解决办法的工作,这些解决办法通过利用非传统数据来源以及最近在AI和机器学习方面的创新,在加强我们的预测能力方面显示出潜力。这项调查进入了各种数据驱动的方法和实际进展,并提出了从这些进展中走过的概念框架。首先,我们列举了大量流行病学数据集和新的数据流与流行病预测有关,并收集了各种因素,例如在线症状调查、零售和商业、流动性、基因组数据流数据流和更多领域的最新数据流。我们讨论了这些统计方法的模型和新模式,并结合了这些统计模型和新数据流方法。最后,我们讨论了这些统计模型的模型和新趋势,并结合了这些模型,并结合了这些统计模型和新模型,并结合了这些模型,并综合了这些模型和新趋势。