The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors; we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth-order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing finite-sample performance of the suggested two-scale DNN method are illustrated with several numerical examples.
翻译:加权最近的邻居(WNN) 估计符已被广泛用作一个灵活且容易执行的非参数性平均回归估测工具。 包式技术是一种优雅的方法,可以形成带有自动生成至近邻的重量的 WNN 估测器; 我们将由此得出的估计值命名为分布最接近的邻居(DNN), 以便于参考。 然而, 这种估测仪缺乏分布结果, 将其应用限制在统计推断中。 此外, 当平均回归函数具有较高顺序平稳性时, DNN 并没有达到最佳的非参数性均匀趋同率, 主要是因为存在偏差问题。 在这项工作中, 我们对DNNN 估算值进行深入的技术分析, 在此基础上, 我们建议对DNNN 估算值采用偏差方法, 将DNNN 估测算值和DND 估估测结果进行直径比对等值。 双级的DNNND测算和一些直径比值表示( ), 我们用两种直标的直标的直径直径比值表示 和正标的直径比值表示 。 我们用一个算法, 直标的直标的直标的直径比值表示, 直径比值表示, 直算到右比值显示的直算, 直算, 直算到右比值的直数到正算, 直算算算算算, 直算为正标的 。