Missing data often result in undesirable bias and loss of efficiency. These issues become substantial when the response mechanism is nonignorable, meaning that the response model depends on unobserved variables. To manage nonignorable nonresponse, it is necessary to estimate the joint distribution of unobserved variables and response indicators. However, model misspecification and identification issues can prevent robust estimates, even with careful estimation of the target joint distribution. In this study, we modeled the distribution of the observed parts and derived sufficient conditions for model identifiability, assuming a logistic regression model as the response mechanism and generalized linear models as the main outcome model of interest. More importantly, the derived sufficient conditions do not require any instrumental variables, which are often assumed to guarantee model identifiability but cannot be practically determined beforehand. To analyze missing data in applications, we propose practical guidelines and sensitivity analysis to determine the response mechanism. Furthermore, we present the performance of the proposed estimators in numerical studies and apply the proposed method to two sets of real data: exit polls from the 19th South Korean election and public data collected from the Korean Survey of Household Finances and Living Conditions.
翻译:暂无翻译