Using administrative patient-care data such as Electronic Health Records and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect their parameter estimates of interest. We review four easy-to-implement weighting approaches to reduce selection bias and explain through a simulation study when they can rescue us in practice with analysis of real world data. We provide annotated R codes to implement these methods.
翻译:越来越多的基于行政性患者护理数据的科学研究使用电子健康记录、医疗/药品索赔等数据来源。由于庞大的样本规模可导致非常小的标准误差,因此研究人员需要更加注意潜在的偏差,尤其是那些不随着样本规模增加而减少的偏差。在这些多种偏差来源中,本文将重点关注理解选择性偏差。我们提出了一种基于有向无环图的分析框架,该框架可指导应用研究人员剖析不同的选择性偏差来源如何影响感兴趣的参数估计。我们综述了四种易于实施的加权方法以减少选择偏差,并通过对真实世界数据的分析和模拟研究解释了它们在实践中的可行性。我们还提供了注释的R代码实现这些方法。