Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of hyperparameters. They are built via aggregation of multiple regression trees during training and are usually calculated recursively using hard splitting rules. Recently regression forests have been incorporated into the framework of distributional regression, a nowadays popular regression approach aiming at estimating complete conditional distributions rather than relating the mean of an output variable to input features only - as done classically. This article proposes a new type of a distributional regression tree using a multivariate soft split rule. One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree while the complexity of the function is controlled adaptive by information criteria. Moreover, the search for the optimal split variable is obsolete. We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods, especially in the presence of complex non-linear feature interactions. Finally, we illustrate the usefulness of our approach with an example on probabilistic forecasts for the Sun's activity.
翻译:随机森林是一种共合方法,与许多问题相关,如回归或分类等。它们很受欢迎,因为它们的预测性能良好(例如决策树),只要求微量参数的微调。它们是在训练期间通过多个回归树的聚合而建造的,通常使用硬分法规则反复计算。最近回归森林已被纳入分布式回归框架,即现在流行的回归方法,目的是估计完全有条件的分布,而不是将输出变量的平均值仅与输入特征挂钩――如传统做法那样。本文章提议使用多变量软分裂规则来建立新的分布式回归树类型。软分裂的一个重大优势是,光度高功能只能用一棵树来估计,而功能的复杂性则由信息标准加以调整。此外,对最佳分裂变量的搜索已经过时。我们通过广泛的模拟研究显示,算法具有极好的特性,并且超越了各种基准方法,特别是在复杂的非线性特征相互作用的情况下。最后,我们用一个关于太阳活动的概率预测的例子来说明我们的方法的效用。