In this paper, we develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights. First, we formulate the training of quantized neural networks (QNNs) as a smoothed sequence of interval-constrained optimization problems. Then, we propose a new first-order stochastic method, AskewSGD, to solve each constrained optimization subproblem. Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set and allows iterates that are infeasible. The numerical complexity of AskewSGD is comparable to existing approaches for training QNNs, such as the straight-through gradient estimator used in BinaryConnect, or other state of the art methods (ProxQuant, LUQ). We establish convergence guarantees for AskewSGD (under general assumptions for the objective function). Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
翻译:在本文中,我们开发了一种新的算法,即Annaaled Skewed SGD - AskewSGD - 用于培训具有四分制重量的深神经网络(DNNS) 。 首先,我们将量化神经网络(QNNS)的培训设计成一个间歇性优化问题的平滑序列。 然后,我们提出了一个新的第一级随机分析方法(AskewSGD),以解决每个受限制的优化子问题。 与具有积极组合和可行方向的算法不同,AskewSGD避免了整个可行数据集下的预测或优化,并允许不可行的迭代。 AskewSGD 的数值复杂性与现有的培训QNSPs的方法相似,例如Binaryconect中使用的直通梯度估计器,或艺术方法的其他状态(ProxQuant,LUQ) 。我们为AskewSGD(根据客观功能的一般假设)建立了趋同保证。实验结果显示,AskewSGD 算法比古典基准中的艺术状态更好或更接近。