Missing data often result in undesirable bias and loss of efficiency. These become substantial problems when the response mechanism is nonignorable, such that the response model depends on unobserved variables. It is necessary to estimate the joint distribution of unobserved variables and response indicators to manage nonignorable nonresponse. However, model misspecification and identification issues prevent robust estimates despite careful estimation of the target joint distribution. In this study, we modelled the distribution of the observed parts and derived sufficient conditions for model identifiability, assuming a logistic regression model as the response mechanism and generalised linear models as the main outcome model of interest. More importantly, the derived sufficient conditions are testable with the observed data and do not require any instrumental variables, which are often assumed to guarantee model identifiability but cannot be practically determined beforehand. To analyse missing data, we propose a new imputation method which incorporates verifiable identifiability using only observed data. Furthermore, we present the performance of the proposed estimators in numerical studies and apply the proposed method to two sets of real data: exit polls for the 19th South Korean election data and public data collected from the Korean Survey of Household Finances and Living Conditions.
翻译:暂无翻译