We consider the problem of minimizing a function over the manifold of orthogonal matrices. The majority of algorithms for this problem compute a direction in the tangent space, and then use a retraction to move in that direction while staying on the manifold. Unfortunately, the numerical computation of retractions on the orthogonal manifold always involves some expensive linear algebra operation, such as matrix inversion, exponential or square-root. These operations quickly become expensive as the dimension of the matrices grows. To bypass this limitation, we propose the landing algorithm which does not use retractions. The algorithm is not constrained to stay on the manifold but its evolution is driven by a potential energy which progressively attracts it towards the manifold. One iteration of the landing algorithm only involves matrix multiplications, which makes it cheap compared to its retraction counterparts. We provide an analysis of the convergence of the algorithm, and demonstrate its promises on large-scale and deep learning problems, where it is faster and less prone to numerical errors than retraction-based methods.
翻译:我们考虑的是将正向矩阵的函数最小化的问题。 这个问题的大多数算法都在正向空间中计算方向, 然后使用撤回法朝这个方向移动, 而同时停留在多面上。 不幸的是, 正向矩阵的撤回数字计算总是涉及一些昂贵的线性代数操作, 比如矩阵反向、 指数或平根。 这些操作随着矩阵的维度增长而迅速变得昂贵。 为了绕过这一限制, 我们提议不使用撤回法的着陆算法。 该算法并不局限于在多面上停留, 但它的演变是由一种潜在的能量驱动的, 它将它逐渐吸引到多面上。 着陆算法的重复只涉及矩阵的乘法, 这使得它比其反向对应的乘法更便宜。 我们分析了算法的趋同, 并展示它对于大规模和深层学习问题的承诺, 在那里它比撤回法方法更快和较少出现数字错误。