具有可核实的识别条件的对不可忽略的缺失结果的估算 (Imputation with verifiable identification condition for nonignorable missing outcomes)

Missing data often results in undesirable bias and loss of efficiency. These results become substantial problems when the response mechanism is nonignorable, such that the response model depends on the unobserved variable. It is often necessary to estimate the joint distribution of the unobserved variables and response indicators to further manage nonignorable nonresponse. However, model misspecification and identification issues prevent robust estimates, despite carefully estimating the target joint distribution. In this study we model the distribution of the observed parts and derived sufficient conditions for model identifiability, assuming a logistic distribution of the response mechanism and a generalized linear model as the main outcome model of interest. More importantly, the derived sufficient conditions are testable with the observed data and do not require any instrumental variables, which have often been assumed to guarantee model identifiability but cannot be practically determined beforehand. To analyse missing data, we propose a new fractional imputation method which incorporates verifiable identifiability using only the observed data. Furthermore, we present the performance of the proposed estimators in numerical studies and apply the proposed method to two sets of real data, namely, Opinion Poll for the 2022 South Korean Presidential Election, and public data collected from the US National Supported Work Evaluation Study.

翻译：缺少的数据往往导致不可取的偏差和效率的丧失。当反应机制是不可忽略的时,这些结果就成了重大问题,因此反应模式取决于未观测的变量。通常有必要估计未观测的变量和反应指标的联合分布情况,以进一步管理不值得注意的不答复。然而,模型的分类和识别问题尽管仔细估计了目标的联合分布,却防止了可靠的估计;在这项研究中,我们模拟所观察到的部件的分布,并推断出足够的可验证性示范性条件,假设反应机制的后勤分配和普遍线性线性模型是主要的结果模型。更重要的是,所得出的足够条件可以用所观察到的数据进行测试,而不需要任何工具变量,这些变量通常假定是为了保证模型的可识别性,但实际上无法事先确定。为了分析缺失的数据,我们提出了新的局部估算方法,其中仅用所观察到的数据纳入可核查的可识别性。此外,我们介绍了拟议的估算者在数字研究中的绩效,并将拟议方法应用于两套真实数据,即2022年韩国总统选举的民意调查,以及从美国收集的公共数据。