The semiparametric estimation approach, which includes inverse-probability-weighted and doubly robust estimation using propensity scores, is a standard tool for marginal structural models used in causal inference, and it is rapidly being extended in various directions. On the other hand, although model selection is indispensable in statistical analysis, an information criterion for selecting an appropriate marginal structure has just started to be developed. In this paper, we derive an Akaike information type of criterion on the basis of the original definition of the information criterion. Here, we define a risk function based on the Kullback-Leibler divergence as the cornerstone of the information criterion and treat a general causal inference model that is not necessarily a linear one. The causal effects to be estimated are those in the general population, such as the average treatment effect on the treated or the average treatment effect on the untreated. In light of the fact that this field attaches importance to doubly robust estimation, which allows either the model of the assignment variable or the model of the outcome variable to be wrong, we make the information criterion itself doubly robust so that either one can be wrong and it will still be a mathematically valid criterion. In simulation studies, we compare the derived criterion with an existing criterion obtained from a formal argument and confirm that the former outperforms the latter. Specifically, we check that the divergence between the estimated structure from the derived criterion and the true structure is clearly small in all simulation settings and that the probability of selecting the true or nearly true model is clearly higher. Real data analyses confirm that the results of variable selection using the two criteria differ significantly.
翻译:半参数估计方法包括反概率加权和双倍强估值,采用偏差分法,这是用于因果推断的边缘结构模型的标准工具,而且正在迅速扩展。另一方面,虽然模型选择在统计分析中不可或缺,但选择适当边际结构的信息标准刚刚开始开发。在本文件中,我们根据信息标准的最初定义,得出了Akaike信息类型的标准。在这里,我们根据Kullback-Leibeller差值来定义风险函数,作为信息标准的基石,并处理一个一般因果推断模型,但不一定是线性的一种。要估计的因果关系效应是一般人群中的结果,例如对治疗的平均治疗效果或对未处理的普通治疗效果。鉴于这个领域重视更精确的估算,使得分配变量模型或结果变量模型错误,我们使信息标准本身更加强大,这样,要么错误,要么是一般性的因果关系推断模型,要么是几乎不直线的。要估计的因果关系是一般的,然后是精确地检验标准。我们用一个可靠的标准来明确推算出真实的标准。