The complexity in large-scale optimization can lie in both handling the objective function and handling the constraint set. In this respect, stochastic Frank-Wolfe algorithms occupy a unique position as they alleviate both computational burdens, by querying only approximate first-order information from the objective and by maintaining feasibility of the iterates without using projections. In this paper, we improve the quality of their first-order information by blending in adaptive gradients. We derive convergence rates and demonstrate the computational advantage of our method over the state-of-the-art stochastic Frank-Wolfe algorithms on both convex and nonconvex objectives. The experiments further show that our method can improve the performance of adaptive gradient algorithms for constrained optimization.
翻译:大规模优化的复杂性在于既处理客观功能,又处理制约因素。在这方面,随机的弗兰克-沃夫算法具有独特的地位,因为它们缓解了计算负担,只从目标中查询近似的第一阶信息,并保持迭代的可行性,而不使用预测。在本文中,我们通过将适应性梯度混合在一起,提高第一阶信息的质量。我们得出趋同率,并展示我们的方法相对于最先进的随机法法弗兰克-沃夫算法在Convex和非convex目标上的计算优势。实验还进一步表明,我们的方法可以改进适应性梯度算法的性能,以限制优化。