Propensity scores are commonly used to reduce the confounding bias in non-randomized observational studies for estimating the average treatment effect. An important assumption underlying this approach is that all confounders that are associated with both the treatment and the outcome of interest are measured and included in the propensity score model. In the absence of strong prior knowledge about potential confounders, researchers may agnostically want to adjust for a high-dimensional set of pre-treatment variables. As such, variable selection procedure is needed for propensity score estimation. In addition, recent studies show that including variables related to treatment only in the propensity score model may inflate the variance of the treatment effect estimates, while including variables that are predictive of only the outcome can improve efficiency. In this paper, we propose a flexible approach to incorporating outcome-covariate relationship in the propensity score model by including the predicted binary outcome probability (OP) as a covariate. Our approach can be easily adapted to an ensemble of variable selection methods, including regularization methods and modern machine learning tools based on classification and regression trees. We evaluate our method to estimate the treatment effects on a binary outcome, which is possibly censored, among multiple treatment groups. Simulation studies indicate that incorporating OP for estimating the propensity scores can improve statistical efficiency and protect against model misspecification. The proposed methods are applied to a cohort of advanced stage prostate cancer patients identified from a private insurance claims database for comparing the adverse effects of four commonly used drugs for treating castration-resistant prostate cancer.
翻译:通常使用分数来减少非随机化观察研究中令人困惑的偏差,以估计平均治疗效果; 这种方法的一个重要假设是,与治疗和利息结果相关的所有同治疗和利息结果有关的分数都得到衡量,并列入偏好分数模型; 在缺乏关于潜在分数的强有力先前知识的情况下,研究人员可能自觉地希望调整,以适应一套高维的预处理变量; 因此, 需要采用不同的癌症选择程序来估计偏向性分数; 此外, 最近的研究显示, 将仅在常度分模型中与治疗有关的变量纳入仅与偏向性分数模型有关的治疗变量,可能抵消治疗估计结果的差异,同时包括只预测结果的变量,可以提高效率; 在本文中,我们提出一种灵活的方法,将预测的二元结果概率概率概率概率概率概率概率(OP)作为常数。 我们提出的方法可以很容易适应变量选择方法,包括标准化方法以及基于分类和回归性树的现代机器治疗工具。 我们评估的是, 利用多种方法来评估周期性研究结果, 用来评估计算结果。