Streaming data routinely generated by mobile phones, social networks, e-commerce, and electronic health records present new opportunities for near real-time surveillance of the impact of an intervention on an outcome of interest via causal inference methods. However, as data grow rapidly in volume and velocity, storing and combing data become increasingly challenging. The amount of time and effort spent to update analyses can grow exponentially, which defeats the purpose of instantaneous surveillance. Data sharing barriers in multi-center studies bring additional challenges to rapid signal detection and update. It is thus time to turn static causal inference to online causal learning that can incorporate new information as it becomes available without revisiting prior observations. In this paper, we present a framework for online estimation and inference of treatment effects leveraging a series of datasets that arrive sequentially without storing or re-accessing individual-level raw data. We establish estimation consistency and asymptotic normality of the proposed framework for online causal inference. In particular, our framework is robust to biased data batches in the sense that the proposed online estimator is asymptotically unbiased as long as the pooled data is a random sample of the target population regardless of whether each data batch is. We also provide an R package for analyzing streaming observational data that enjoys great computation efficiency compared to existing software packages for offline analyses. Our proposed methods are illustrated with extensive simulations and an application to sequential monitoring of adverse events post COVID-19 vaccine.
翻译:移动电话、社交网络、电子商务和电子健康记录经常生成的数据流,为通过因果推断方法对干预对利益结果的影响进行近实时监测提供了新的机会,然而,随着数据在数量和速度上迅速增长,储存和梳理数据变得日益具有挑战性;更新分析花费的时间和努力量会成倍增长,这不利于即时监测的目的;多中心研究的数据共享障碍给快速信号检测和更新带来了更多挑战;因此,现在是时候将静态因果推断转向在线因果学习,这种学习可以纳入新信息,而无需重新审视先前的观察即可获得。在本文件中,我们提出了一个在线估计和推断治疗效果的框架,利用一系列数据在数量和速度上迅速增长,储存或重新获得个人一级原始数据后按顺序运达。我们为在线因果推断的拟议框架建立了一致性和无孔的正常性。特别是,我们提出的框架对偏差数据组合组合,即拟议的在线估计性因果性因果学习,可以纳入新的信息,而无需重新审视先前的观察结果。我们提出的在线估计和推断效果的处理框架,即在线估计和推断效果分析对治疗效果的影响评价效果,因为我们现有系列的系列的系列数据是对现有系列数据的抽样分析,现在的系列数据是用来分析。