The Cartesian reverse derivative is a categorical generalization of reverse-mode automatic differentiation. We use this operator to generalize several optimization algorithms, including a straightforward generalization of gradient descent and a novel generalization of Newton's method. We then explore which properties of these algorithms are preserved in this generalized setting. First, we show that the transformation invariances of these algorithms are preserved: while generalized Newton's method is invariant to all invertible linear transformations, generalized gradient descent is invariant only to orthogonal linear transformations. Next, we show that we can express the change in loss of generalized gradient descent with an inner product-like expression, thereby generalizing the non-increasing and convergence properties of the gradient descent optimization flow. Finally, we include several numerical experiments to illustrate the ideas in the paper and demonstrate how we can use them to optimize polynomial functions over an ordered ring.
翻译:笛卡尔反向衍生物是反向模式自动区分的绝对一般化。 我们使用这个操作员来推广几种优化算法, 包括直截了当的梯度下降法和牛顿方法的新型概括化。 然后我们探索这些算法的哪些特性在这种普遍化环境中得以保留。 首先, 我们证明这些算法的变异性得到了保留: 虽然普化的牛顿方法对所有不可逆的线性变换是不可变的, 普通梯度下降法只是不易变的线性变。 其次, 我们可以用一种类似内产物的表达方式来表达普遍梯度下降的改变, 从而将梯度下降的不增加的和趋同的特性概括化。 最后, 我们包含一些数字实验, 来说明文件中的想法, 并展示我们如何利用它们优化定序环上的多元函数 。