EHR-derived databases are commonly subject to left truncation, a type of selection bias induced due to patients needing to survive long enough to satisfy certain entry criteria. Standard methods to adjust for left truncation bias rely on an assumption of marginal independence between entry and survival times, which may not always be satisfied in practice. In this work, we examine how a weaker assumption of conditional independence can result in unbiased estimation of common statistical parameters. In particular, we show the estimability of conditional parameters in a truncated dataset, and of marginal parameters that leverage reference data containing non-truncated data on confounders. The latter is complementary to observational causal inference methodology applied to real world external comparators, which is a common use case for real world databases. We implement our proposed methods in simulation studies, demonstrating unbiased estimation and valid statistical inference. We also illustrate estimation of a survival distribution under conditionally independent left truncation in a real world clinico-genomic database.
翻译:电子人力资源衍生数据库通常受到左逃逸的影响,这种选择偏差是由于病人需要足够长的存活时间才能满足某些入境标准而引发的; 调整左逃逸偏差的标准方法取决于在入境和生存时间之间有边际独立性的假设,而在实践中可能并不总是能够满足这一假设; 在这项工作中,我们研究对有条件独立假设较弱如何导致对共同统计参数的不偏袒估计; 特别是,我们显示了在短逃数据集中有条件参数的可估计性,以及利用含有凝聚者非流传数据的参考数据的边际参数的可估计性参数,后者是对适用于现实世界外部参照者的观察性因果关系推断方法的补充,这是真实世界数据库常用的一个常见案例; 我们在模拟研究中采用我们所提议的方法,显示公正的估计和有效的统计推理; 我们还在现实世界临床基因学数据库中对有条件的左逃逸状态下的生存分布进行了估计。