Nearest neighbor (NN) matching as a tool to align data sampled from different groups is both conceptually natural and practically well-used. In a landmark paper, Abadie and Imbens (2006) provided the first large-sample analysis of NN matching under, however, a crucial assumption that the number of NNs, $M$, is fixed. This manuscript reveals something new out of their study and shows that, once allowing $M$ to diverge with the sample size, an intrinsic statistic in their analysis actually constitutes a consistent estimator of the density ratio. Furthermore, through selecting a suitable $M$, this statistic can attain the minimax lower bound of estimation over a Lipschitz density function class. Consequently, with a diverging $M$, the NN matching provably yields a doubly robust estimator of the average treatment effect and is semiparametrically efficient if the density functions are sufficiently smooth and the outcome model is appropriately specified. It can thus be viewed as a precursor of double machine learning estimators.
翻译:近邻(NN) 匹配作为调和不同群体抽样数据的工具,在概念上是自然的,实际上也使用得很好。在一份具有里程碑意义的文件中,Abadie和Imbens(2006年)提供了首次对NN匹配的大规模抽样分析,但这一关键假设是,NN数量($M美元)已经固定。这份手稿揭示了他们的研究中的新内容,并表明,一旦允许美元与抽样规模发生差异,其分析中的内在统计实际上就构成了对密度比率的一致估计。此外,通过选择一个合适的$M美元,这一统计可以达到对Lipschitz密度功能等级最低的下限估计。因此,如果差价为$M,NN匹配可产生对平均治疗效果的双倍强的估算值,如果密度功能足够平稳,结果模型也适当指定,则具有半分数效率。因此,它可以被视为双机学习估计值的先兆。