Regression tree (RT) has been widely used in machine learning and data mining community. Given a target data for prediction, a regression tree is first constructed based on a training dataset before making prediction for each leaf node. In practice, the performance of RT relies heavily on the local mean of samples from an individual node during the tree construction/prediction stage, while neglecting the global information from different nodes, which also plays an important role. To address this issue, we propose a novel regression tree, named James-Stein Regression Tree (JSRT) by considering global information from different nodes. Specifically, we incorporate the global mean information based on James-Stein estimator from different nodes during the construction/predicton stage. Besides, we analyze the generalization error of our method under the mean square error (MSE) metric. Extensive experiments on public benchmark datasets verify the effectiveness and efficiency of our method, and demonstrate the superiority of our method over other RT prediction methods.
翻译:在机器学习和数据开采界广泛使用回归树(RT),根据预测的目标数据,在对每个叶节作出预测之前,首先根据培训数据集建造回归树。实际上,在树的建造/准备阶段,RT的性能主要依赖单个节点的当地样本平均值,而忽视了不同节点的全球信息,这些信息也发挥着重要作用。为了解决这一问题,我们建议了一个新的回归树,名为James-Stein回归树(JSRT),通过考虑来自不同节点的全球信息。具体地说,我们采用了建筑/前置阶段不同节点基于James-Stein估计器的全球平均值信息。此外,我们还分析了在平均方形错误(MSE)指标下我们方法的普遍错误。关于公共基准数据集的广泛实验证实了我们方法的有效性和效率,并展示了我们方法优于其他RT预测方法。