To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
翻译:为了最大限度地减少一组对数分流函数的平均值,牛顿随机法利用子抽样版本的全目标梯度和海珊来迭代更新其估计值。 我们将此优化问题背景化为通过歧视性特定观测过程对潜伏状态空间模型进行连续贝叶斯式推论。 应用贝叶斯过滤法然后产生一种新颖的优化算法, 在形成更新时考虑梯度和赫斯历程的整个历史。 我们建立了基于矩阵的条件, 在这些条件下,老观测结果的影响会随着时间的流逝而减少, 其方式类似于波里雅克的重球动力。 我们用一个实例来说明我们方法的方方面面, 并审查对牛顿方法的其他相关创新。