Adaptive methods do not have a direct generalization to manifolds as the adaptive term is not invariant. Momentum methods on manifolds suffer from efficiency problems stemming from the curvature of the manifold. We introduce a framework to generalize adaptive and momentum methods to arbitrary manifolds by noting that for every differentiable manifold, there exists a radially convex open set that covers almost all the manifold. Being radially convex, this set is diffeomorphic to $\mathbb{R}^n$. This gives a natural generalization of any adaptive and momentum-based algorithm to a set that covers almost all the manifold in an arbitrary manifolds. We also show how to extend these methods to the context of gradient descent methods with a retraction. For its implementation, we bring an approximation to the exponential of matrices that needs just of 5 matrix multiplications, making it particularly efficient on GPUs. In practice, we see that this family of algorithms closes the numerical gap created by an incorrect use of momentum and adaptive methods on manifolds. At the same time, we see that the most efficient algorithm of this family is given by simply pulling back the problem to the tangent space at the initial point via the exponential map.
翻译:适应性术语不是变化性的,因此适应性方法没有直接的概括性地概括到多个元体,对多个元体的动态方法存在由元体弯曲产生的效率问题。我们引入了一个框架,将适应性和动力方法推广到任意的元体。我们注意到,对于每一个不同的元体,都存在一个覆盖几乎所有元体的极分曲线开放的集合。这个组合是光线共振的,它具有diffeomortic 至 $\ mathbb{R ⁇ n$。这让任何适应性和基于动力的算法自然地概括到一个包含任意的元体中几乎所有元体的集。我们还展示了如何将这些方法推广到梯度下降法的背景中。为了实施这个框架,我们把仅仅需要5个矩阵乘法的矩阵的指数拉近,使其在GPUs上特别有效。在实践上,我们看到这种算法的组合缩小了由于不正确使用动力和适应性方法在多个元体上造成的数字差距。同时,我们看到,这个家庭最有效的最高效的算法是借助初始空间转换到恒度的地图。