We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past iterate information that could be significantly stale. Our proposed algorithms are efficient in terms of accessed data points (epochs) and have enough concurrency to take advantage of parallel/distributed computing environments. We provide convergence guarantees for our proposed methods. Numerical tests on a toy classification problem as well as on popular benchmarking binary classification and neural network training tasks reveal that the methods outperform their classical variants.
翻译:我们提出了两种抽样的准纽顿方法(抽样LBFGS和抽样LSR1),以解决机器学习中出现的尽量减少风险的经验性问题。与这些方法的古典变体相反,这些方法随着优化的进展而依次构建赫森或赫森近似值,我们建议的方法抽样点随机围绕当前循环的每个迭代产生这些近似值。因此,所构建的近似值利用了更可靠的(最近和当地)信息,而并不取决于以往的反复信息,而这种信息可能非常陈旧。我们提议的算法在访问数据点(时代)方面是有效的,并且有足够的共通货币来利用平行/分散的计算环境。我们为我们拟议的方法提供了趋同保证。对一个小类分类问题以及流行的基准二元分类和神经网络培训任务进行数值测试表明,这些方法超越了它们的典型变体。