For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging: even when the objective is smooth at every iterate, the corresponding local models are unstable, and traditional remedies need unpredictably many cutting planes. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a "max-of-smooth" model that captures subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.
翻译:对于高度平坦的目标,典型的梯度下降理论确保了与梯度评估数量相对的线性趋同。 类似的非悬浮理论具有挑战性:即使每个周期的目标均匀,相应的当地模型也是不稳定的,传统补救措施需要难以预料的很多切割机。 我们相反建议多点地普及梯度下降迭代,以便地方优化。 尽管我们设计时考虑一般目标,但我们的动机是“最大吸附”模型,该模型在最佳性上捕捉了次偏差的维度。 当目标本身是最大吸附时,我们证明了线性趋同,而实验则表明一种更为普遍的现象。