In problems with large amounts of missing data one must model two distinct data generating processes: the outcome process which generates the response and the missing data mechanism which determines the data we observe. Under the ignorability condition of Rubin (1976), however, likelihood-based inference for the outcome process does not depend on the missing data mechanism so that only the former needs to be estimated; partially because of this simplification, ignorability is often used as a baseline assumption. We study the implications of Bayesian ignorability in the presence of high-dimensional nuisance parameters and argue that ignorability is typically incompatible with sensible prior beliefs about the amount of selection bias. We show that, for many problems, ignorability directly implies that the prior on the selection bias is tightly concentrated around zero. This is demonstrated on several models of practical interest, and the effect of ignorability on the posterior distribution is characterized for high-dimensional linear models with a ridge regression prior. We then show both how to build high-dimensional models which encode sensible beliefs about the selection bias and also show that under certain narrow circumstances ignorability is less problematic.
翻译:在大量缺失数据的问题中,必须模拟两个不同的数据生成过程:产生响应的结果过程和决定我们所观察到的数据的缺失数据机制。然而,在Rubin(1976年)的忽略状态下,对结果过程的基于可能性的推论并不取决于缺失的数据机制,因此只需要对前者作出估计;部分由于这一简化,忽略经常被用作基线假设。我们研究拜斯人忽视高维干扰参数的影响,并争论忽略通常与先前关于选择偏差程度的合理信念不相容。我们表明,对于许多问题,忽略直接意味着选择偏差之前的偏差紧集中在零左右。这体现在几个实际感兴趣的模型上,对后方分布的忽略作用在高维线模型中具有特征,而以前则带有斜坡回归。我们然后展示如何建立高维模型,用以解释对选择偏差的合理信念,同时表明在某些狭小情况下,忽略问题较少。