Recently, fitting probabilistic models have gained importance in many areas but estimation of such distributional models with very large data sets is a difficult task. In particular, the use of rather complex models can easily lead to memory-related efficiency problems that can make estimation infeasible even on high-performance computers. We therefore propose a novel backfitting algorithm, which is based on the ideas of stochastic gradient descent and can deal virtually with any amount of data on a conventional laptop. The algorithm performs automatic selection of variables and smoothing parameters, and its performance is in most cases superior or at least equivalent to other implementations for structured additive distributional regression, e.g., gradient boosting, while maintaining low computation time. Performance is evaluated using an extensive simulation study and an exceptionally challenging and unique example of lightning count prediction over Austria. A very large dataset with over 9 million observations and 80 covariates is used, so that a prediction model cannot be estimated with standard distributional regression methods but with our new approach.
翻译:最近,适当的概率模型在许多领域变得重要,但用非常庞大的数据集来估计这种分布模型是一项艰巨的任务,特别是,使用相当复杂的模型很容易导致与记忆相关的效率问题,即使高性能计算机上也无法作出估计,因此,我们提议采用新的回调算法,该算法以随机梯度梯度下降的理论为基础,几乎可以处理常规膝上的任何数据。算法自动选择变量和平滑参数,其性能在大多数情况下优于或至少等同于结构化的叠加分布回归的其他实施,例如梯度加速,同时保持低计算时间。业绩是通过广泛的模拟研究来评价的,以及奥地利上空闪电计数预测的一个极具挑战性和独特的例子。我们采用了一个庞大的数据集,有900多万次观测和80个共变数,因此,预测模型无法用标准的分布回归方法来估计,而是用我们的新方法来估计。