We present a novel method of stacking decision trees by projection into an ordered time split out-of-fold (OOF) one nearest neighbor (1NN) space. The predictions of these one nearest neighbors are combined through a linear model. This process is repeated many times and averaged to reduce variance. Generalized Linear Tree Space Nearest Neighbor (GLTSNN) is competitive with respect to Mean Squared Error (MSE) compared to Random Forest (RF) on several publicly available datasets. Some of the theoretical and applied advantages of GLTSNN are discussed. We conjecture a classifier based upon the GLTSNN would have an error that is asymptotically bounded by twice the Bayes error rate like k = 1 Nearest Neighbor.
翻译:我们提出一种新的方法,将决策树堆叠成一个定序时间间隔的翻转(OOF),一个最近的邻居(NN)空间。这些最近的邻居(NN)空间的预测是通过线性模型结合的。这一过程反复多次,平均减少差异。一般线性线性树空间近邻(GLTSNN)在一些公开的数据集中与随机森林(Random Forest)相比具有竞争力。讨论了GLTSN的一些理论和应用优势。我们推测一个基于 GLTSNN 的分类器会有一个误差,该误差会被两倍的基斯误差率(如K=1 Neearest Neghbor)所束缚。