For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is difficult to bound, leading to convergences guarantees that are sublinear relative to the cumulative number of gradient evaluations. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a ``max-of-smooth'' model that captures the subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.
翻译:对于高度平坦的目标,典型的梯度下降理论确保了相对于梯度评价数量的线性趋同。类似的非吸附理论具有挑战性。即使每个迭代的目标均匀,相应的当地模式也不稳定,传统补救办法所援引的切割机数量也难以捆绑,导致与梯度评价累积数相对的分线性保证趋同。相反,我们建议对梯度下降迭代进行多点概括化,供地方优化使用。我们虽然在设计时考虑到一般目标,但我们的动机是“吸附最大”模型,该模型以最佳的方式捕捉次偏差层面。当目标本身为最大吸附值时,我们证明了线性趋同,而实验则表明更普遍的现象。