This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a \textit{generalised boosted forest}. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.
翻译:本文扩展了最近关于将随机森林推向非加西语响应模型的工作。 根据指数式家庭 $\mathbb{E}[Y ⁇ X] = g ⁇ -1}(f(X)), 我们的目标是获得美元估计数。 我们从链接空间的 MLE 类型估算开始, 然后从中定义一般的残留物。 我们用这些残留物和一些相应的重量来适应一个基准随机森林, 然后重复同样的重量来获取随机森林。 我们称这三个估计者的总和为 \ text{clectenized Profed Form} 。 我们用模拟和真实的数据显示, 随机森林步骤减少了测试设定的日志相似性, 我们把它当作我们的主要指标。 我们还提供了差异估计器, 我们可以用与原始估算本身相同的计算成本来获得。 真实世界数据和模拟的实验表明, 方法可以有效减少偏差, 而信任间隔范围在共位分配中是保守的。