The use of observational time series data to assess the impact of multi-time point interventions is becoming increasingly common as more health and activity data are collected and digitized via wearables, social media, and electronic health records. Such time series may involve hundreds or thousands of irregularly sampled observations. One common analysis approach is to simplify such time series by first discretizing them into sequences before applying a discrete-time estimation method that adjusts for time-dependent confounding. In certain settings, this discretization results in sequences with many time points; however, the empirical properties of longitudinal causal estimators have not been systematically compared on long sequences. We compare three representative longitudinal causal estimation methods on simulated and real clinical data. Our simulations and analyses assume a Markov structure and that longitudinal treatments/exposures are binary-valued and have at most a single jump point. We identify sources of bias that arise from temporally discretizing the data and provide practical guidance for discretizing data and choosing between methods when working with long sequences. Additionally, we compare these estimators on real electronic health record data, evaluating the impact of early treatment for patients with a life-threatening complication of infection called sepsis.
翻译:使用观测时间序列数据来评估多时间点干预措施的影响,随着更多的健康和活动数据通过磨损器、社交媒体和电子健康记录收集和数字化而日益普遍地使用观测时间序列数据来评估多时间点干预措施的影响。这种时间序列可能涉及数百或数千次非正常抽样观察。一种共同的分析方法是在应用一个根据时间因素调整的离散时间估计方法之前,先将时间序列分解成序列,然后将其简化为序列,以适应时间因素调整。在某些环境里,这种离散的结果按多个时间点排列;然而,没有系统地比较长序列的纵向因果估计员的经验特性。我们比较了模拟和实际临床数据的三个具有代表性的纵向因果估计方法。我们的模拟和分析假设假设是Markov结构,以及长期治疗/接触是二进制的,而且大多是一个单跳点。我们找出数据时间离散引起的偏差的来源,为数据分解和在使用长序列时选择方法提供实际指导。此外,我们比较了这些关于真实电子记录数据的估计因素,评估了病人早期治疗的复杂程度,评估其生命受到威胁的情况。