During the COVID-19 pandemic, a significant effort has gone into developing ML-driven epidemic forecasting techniques. However, benchmarks do not exist to claim if a new AI/ML technique is better than the existing ones. The "covid-forecast-hub" is a collection of more than 30 teams, including us, that submit their forecasts weekly to the CDC. It is not possible to declare whether one method is better than the other using those forecasts because each team's submission may correspond to different techniques over the period and involve human interventions as the teams are continuously changing/tuning their approach. Such forecasts may be considered "human-expert" forecasts and do not qualify as AI/ML approaches, although they can be used as an indicator of human expert performance. We are interested in supporting AI/ML research in epidemic forecasting which can lead to scalable forecasting without human intervention. Which modeling technique, learning strategy, and data pre-processing technique work well for epidemic forecasting is still an open problem. To help advance the state-of-the-art AI/ML applied to epidemiology, a benchmark with a collection of performance points is needed and the current "state-of-the-art" techniques need to be identified. We propose EpiBench a platform consisting of community-driven benchmarks for AI/ML applied to epidemic forecasting to standardize the challenge with a uniform evaluation protocol. In this paper, we introduce a prototype of EpiBench which is currently running and accepting submissions for the task of forecasting COVID-19 cases and deaths in the US states and We demonstrate that we can utilize the prototype to develop an ensemble relying on fully automated epidemic forecasts (no human intervention) that reaches human-expert level ensemble currently being used by the CDC.
翻译:在COVID-19大流行期间,在研发ML驱动的流行病预测技术方面作出了重大努力;然而,如果新的AI/ML技术优于现有方法,则没有基准可以声称新的AI/ML技术是否优于现有方法。“covid-forward-hub”是30多个团队的集合,包括我们在内,每周向疾病控制中心提交预报。不可能宣布一种方法是否比另一种方法更好使用这些预测,因为每个团队的提交可能与这一时期的不同技术相对应,并涉及人类干预,因为团队在不断改变/调整其方法时,这些预测可被视为“人-专家”预测,不等同于AI/ML方法,尽管这些预测可以用作人类专家业绩的指标。我们有兴趣支持AI/ML在流行病预测方面开展AI/ML研究,这种研究可导致在没有人类干预的情况下进行可变缩缩的预测。哪种方法模型、学习策略和数据处理前技术在流行病预报方面仍然是一个尚未解决的问题。为了帮助推进对流行病学采用的最新干预,需要用一种基准来收集美国的业绩指标,而目前使用一种标准是“我们使用一种标准化的标准化的模型,我们使用的一种标准,我们使用的一种方法来提出一种标准,我们使用一种标准化的模型用于正在使用的一种技术,用以提出一种标准。