We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
翻译:我们显示,许多机学算法是称为巴耶斯学习规则的单一算法的具体实例。 源自贝耶斯原则的规则产生了一系列来自优化、深层次学习和图形模型等领域的多种算法。 这包括典型算法,如山脊回归、牛顿方法和卡尔曼过滤法,以及现代深层次算法,如沙沙梯级下游、RMSprop和辍学。 得出这种算法的关键思想是使用自然梯度估算的候选人分布来接近后方。不同的候选人分布法导致不同的算法和对自然梯度的进一步近似产生了这些算法的变体。 我们的工作不仅统一、概括和改进了现有的算法,而且还帮助我们设计了新的算法。