Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.
翻译:原始高斯模型和推升模型是广泛使用的统计和机器学习技术。 树木推进模型和推升模型显示许多数据集的预测准确性极强,但潜在的缺点是,它假定样品具有有条件的独立性,对空间数据等数据作出不连续的预测,而且它可能难以应付高心绝对变量。 延高斯模型,如高斯进程和组合随机效应模型,是灵活的先期模型,这些模型在样本中明确具有模型依赖性,能够有效地学习预测函数和作出概率预测。 但是,现有的潜伏高斯模型通常假定前平均值为零或线性,这可能是不现实的假设。 文章介绍了一种新颖的方法,将推力和潜伏高斯模型结合起来,以补救上述缺陷,并利用这两种技术的优势。 我们获得的预测准确性比模拟数据实验和现实世界数据实验中的现有方法都更高。