Biased sampling and missing data complicates statistical problems ranging from causal inference to reinforcement learning. We often correct for biased sampling of summary statistics with matching methods and importance weighting. In this paper, we study nearest neighbor matching (NNM), which makes estimates of population quantities from biased samples by substituting unobserved variables with their nearest neighbors in the biased sample. We show that NNM is $L_2$-consistent in the absence of smoothness and boundedness assumptions in finite dimensions. We discuss applications of NNM, outline the barriers to generalizing this work to separable metric spaces, and compare this result to inverse probability weighting.
翻译:误差抽样和缺失数据使从因果推论到强化学习等统计问题复杂化。我们经常用匹配方法和重要性加权法纠正对摘要统计的偏差抽样。我们在本文件中研究最近的近邻匹配(NNM),通过在偏差抽样中与最近的近邻替换未观察到的变量,从偏差抽样中估算人口数量。我们显示NNM在有限的方面缺乏平滑和约束性假设,符合2美元。我们讨论了NNNM的应用,概述了将这项工作推广到可分离的计量空间的障碍,并将这一结果与反概率加权相比较。