When data is partially missing at random, imputation and importance weighting are often used to estimate moments of the unobserved population. In this paper, we study 1-nearest neighbor (1NN) importance weighting, which estimates moments by replacing missing data with the complete data that is the nearest neighbor in the non-missing covariate space. We define an empirical measure, the 1NN measure, and show that it is weakly consistent for the measure of the missing data. The main idea behind this result is that the 1NN measure is performing inverse probability weighting in the limit. We study applications to missing data and mitigating the impact of covariate shift in prediction tasks.
翻译:当数据在随机时被部分缺少时,通常使用估算和重量加权来估计未观察人口的时间。在本文中,我们研究的是最近的邻居(NN)的重量加权,即以完整的数据取代缺失的数据,而该数据是未缺少的共变空间中最近的邻居(NN),以此来估计时间。我们定义了一种经验衡量标准,即1NN测量标准,并表明该测量标准在测量缺失数据时不很一致。这一结果背后的主要想法是,1NN测量标准在极限中进行逆概率加权。我们研究对缺失数据的应用,并减轻预测任务共变的影响。