For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such bias, local polynomial regression (LPoR) and multiscale $k$-NN (MS-$k$-NN) learn the bias term by local regression around the query and extrapolate it to the query itself. However, their theoretical optimality has been shown for the limit of the infinite number of training samples. For correcting the asymptotic bias with fewer observations, this paper proposes a local radial regression (LRR) and its logistic regression variant called local radial logistic regression (LRLR), by combining the advantages of LPoR and MS-$k$-NN. The idea is simple: we fit the local regression to observed labels by taking the radial distance as the explanatory variable and then extrapolate the estimated label probability to zero distance. Our numerical experiments, including real-world datasets of daily stock indices, demonstrate that LRLR outperforms LPoR and MS-$k$-NN.
翻译:对于受监督的分类问题,本文考虑使用观察到的共差来估计本地回归的标签概率。 众所周知的非参数内核滑动和美元- 近邻( k$- NN) 估计器在查询周围的球体上平均使用标签, 具有一贯性, 但却是模糊的偏差, 特别是对球的大半径而言。 要消除这种偏差, 本地多元回归( LPOR) 和多尺度( MS- $- k$- NNN) 学习了本地回归( MS- $- $- 美元- NNN) 的偏差术语, 将查询周围的本地回归( MS- k$- $- NNN ) 本身推算出来。 然而, 他们的理论最佳性已经展示了无限数量的培训样本的极限。 为了用较少的观察来纠正无符号的偏差偏差偏差( LRRR), 本文提出本地的偏差( LRRRR) 及其物流回归变量变法变量,, 将本地的物流回归法(LLLLLR),, 和 MS- NNNN 。 这个数字的比值估计的数值比值的数值比值比值为真实的数值。