The survey world is rife with nonresponse and in many situations the missingness mechanism is not at random, which is a major source of bias for statistical inference. Nonetheless, the survey world is rich with paradata that track the data collection process. A traditional form of paradata is callback data that record attempts to contact. Although it has been recognized that callback data are useful for nonresponse adjustment, they have not been used widely in statistical analysis until recently. In particular, there have been a few attempts that use callback data to estimate response propensity scores, which rest on fully parametric models and fairly stringent assumptions. In this paper, we propose a stableness of resistance assumption for identifying the propensity scores and the outcome distribution of interest, without imposing any parametric restrictions. We establish the semiparametric efficiency theory, derive the efficient influence function, and propose a suite of semiparametric estimation methods including doubly robust ones, which generalize existing parametric approaches. We also consider extension of this framework to causal inference for unmeasured confounding adjustment. Application to a Consumer Expenditure Survey dataset suggests an association between nonresponse and high housing expenditures, and reanalysis of Card (1995)'s dataset on the return to schooling shows a smaller effect of education in the overall population than in the respondents.
翻译:调查世界充斥着不答复的情况,在许多情况下,缺失机制并不是随机的,这是统计推断的主要偏差来源。然而,调查世界充斥着追踪数据收集过程的参数数据。传统的准数据形式是记录试图接触的回调数据。虽然人们认识到回调数据对不答复调整有用,但直到最近统计分析中才广泛使用这些数据。特别是,有几次尝试利用回调数据估计反应偏差分,而回调数据以完全的参数模型和相当严格的假设为依托。在本文件中,我们提出一种稳定的抵制性假设,用以确定利害关系的利得分和结果分配,而不施加任何准限制。我们建立了半对称效率理论,得出有效的影响力功能,并提出一套半对称估计方法,包括较强的、概括现有参数方法。我们还考虑将这一框架扩大到不测的调整的因果关系。对消费支出调查数据设置的运用表明,在住房支出中,不反应和高额支出与总体教育回报方面,对卡德的答卷者们(1995年) 重新分析。