Model training algorithms which observe a small portion of the training set in each computational step are ubiquitous in practical machine learning, and include both stochastic and online optimization methods. In the vast majority of cases, such algorithms typically observe the training samples via the gradients of the cost functions the samples incur. Thus, these methods exploit are the \emph{slope} of the cost functions via their first-order approximations. To address limitations of gradient-based methods, such as sensitivity to step-size choice in the stochastic setting, or inability to exploit small function variability in the online setting, several streams of research attempt to exploit more information about the cost functions than just their gradients via the well-known proximal framework of optimization. However, implementing such methods in practice poses a challenge, since each iteration step boils down to computing a proximal operator, which may not be easy. In this work we provide efficient algorithms and corresponding implementations of proximal operators in order to make experimentation with incremental proximal optimization algorithms accessible to a larger audience of researchers and practitioners, and in particular to promote additional theoretical research into these methods by closing the gap between their theoretical description in research papers and their use in practice. The corresponding code is published at https://github.com/alexshtf/inc_prox_pt.
翻译:在绝大多数情况下,这种算法通常通过样品产生的成本函数梯度来观察培训样本。因此,这些方法的利用是成本函数的第一阶近似值计算出的成本函数。为了解决基于梯度的方法的局限性,例如,在实际的机器学习中,对分级选择的敏感度,或无法利用在线环境中的小函数变异性等,一些研究流试图利用更多关于成本函数的信息,而不是仅仅通过众所周知的精度优化框架来利用它们的成本函数梯度。然而,在实际中采用这种方法是一个挑战,因为每个分级步骤都通过第一阶近似值来计算一个准度操作器。在这项工作中,我们提供了高效的算法和对准度操作器的相应实施,以便进行实验,使更多的研究人员和从业者能够利用渐进式的准度优化算法,特别是通过众所周知的精度框架/实践,促进理论研究对这些方法进行更多的理论研究。