In real-time forecasting in public health, data collection is a non-trivial and demanding task. Often after initially released, it undergoes several revisions later (maybe due to human or technical constraints) - as a result, it may take weeks until the data reaches to a stable value. This so-called 'backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. In this paper, we introduce the multi-variate backfill problem using COVID-19 as the motivating example. We construct a detailed dataset composed of relevant signals over the past year of the pandemic. We then systematically characterize several patterns in backfill dynamics and leverage our observations for formulating a novel problem and neural framework Back2Future that aims to refines a given model's predictions in real-time. Our extensive experiments demonstrate that our method refines the performance of top models for COVID-19 forecasting, in contrast to non-trivial baselines, yielding 18% improvement over baselines, enabling us obtain a new SOTA performance. In addition, we show that our model improves model evaluation too; hence policy-makers can better understand the true accuracy of forecasting models in real-time.
翻译:在公共卫生的实时预测中,数据收集是一项非边际和艰巨的任务。 通常在最初发布后,它会在后来( 可能由于人力或技术限制)进行若干次修改, 因而可能要花几周时间才能使数据达到稳定值。 这种所谓的“回填”现象及其对模型性能的影响在以前的文献中几乎没有研究过。 在本文中,我们采用了以COVID-19作为激励力的例子的多变量回填问题。 我们构建了一个包含该流行病过去一年中相关信号的详细数据集。 然后,我们系统地描述一些回填动态模式的特征,并利用我们的观测来制定一个新的问题和神经框架背2Future。 我们的广泛实验表明,我们的方法改进了COVID-19预报的顶级模型的性能,与非三角基线相比,提高了18%的改进率,使我们能够获得新的SOTA性能。 此外,我们展示了我们的模型改进模型评估方法; 因此,决策者可以更好地了解实时预测模型的真实准确性。