We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function $f: \mathbb{R} \to \mathbb{R}$, our algorithm takes as input a reference point $x_0$, trust region $[a, b]$, and integer $k \ge 1$, and returns an interval $I$ such that $f(x) - \sum_{i=0}^{k-1} \frac {1} {i!} f^{(i)}(x_0) (x - x_0)^i \in I (x - x_0)^k$ for all $x \in [a, b]$. As in automatic differentiation, the function $f$ is provided to the algorithm in symbolic form, and must be composed of known atomic functions. At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., $\exp$, $\log$), we derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX. We then turn our attention to applications. Most notably, we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. Applied to machine learning, this leads to architecture-specific optimizers for training deep networks that converge from any starting point, without hyperparameter tuning. Our experiments show that for some optimization problems, these hyperparameter-free optimizers outperform tuned versions of gradient descent, Adam, and AdaGrad. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.
翻译:我们提出了一种自动限制Taylor余数序列的算法。对于标量函数$f:\mathbb{R} \to \mathbb{R}$的特殊情况,我们的算法以参考点$x_0$,信用区间$[a,b]$和整数$k \geq 1$作为输入,并返回一个区间$I$,使得对于所有$x \in [a,b]$,满足$f(x) - \sum_{i=0}^{k-1} \frac {1} {i!} f^{(i)}(x_0) (x - x_0)^i \in I (x - x_0)^k$。与自动微分类似,函数$f$以符号形式提供给算法,并且必须由已知的原子函数组成。在高层次上,我们的算法具有两个步骤。首先,对于各种常用的基本函数(例如$\exp$,$\log$),我们导出了Taylor余数序列的尖锐多项式上下限。然后,我们使用区间算术变量的Taylor-mode自动微分的变体递归地组合基本功能的限制。我们的算法可以有效地利用机器学习加速器,并且我们提供了一个JAX的开源实现。我们然后将注意力转向应用。最值得注意的是,我们使用我们的新工具创建了第一个通用的主化次小化优化算法:利用自动导出的限制,而不是手工推导出的主化次,迭代地最小化任意损失。应用于机器学习,这导致了针对训练深度网络的架构特定的优化器,这些优化器从任何起点收敛,无需超参数调整。我们的实验表明,对于某些优化问题,这些无超参数的优化器优于梯度下降,Adam和AdaGrad的调整版本。我们还展示了我们自动导出的限制可以用于验证全局优化和数值积分,并且证明了Jensen不等式更锐利的版本。