While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (non uniform) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, we develop an algorithm, MUSKETEER, based on a reinforcement strategy: after collecting information on the noisy gradients, it samples the most promising coordinate (all for one); then it moves along the one direction yielding an important decrease of the objective (one for all). Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems.
翻译:虽然传统形式的随机梯度梯度下降算法以同样的方式对待不同的坐标,但为了利用数据结构,制定了一个允许适应性(非统一)协调抽样的框架;在非曲线设置中,包括零顺序梯度估计,几乎可以肯定地确定趋同和非抗药性界限;在拟议框架内,我们根据强化战略制定了一个算法,即MUSKETEER:在收集噪音梯度信息后,它抽样展示了最有希望的坐标(一个);然后沿着一个方向前进,使目标显著下降(一个对所有人都如此)。关于合成和真实数据实例的数值实验证实了MUSKETEER在大规模问题上的有效性。