-The fluctuation effect of gradient expectation and variance caused by parameter update between consecutive iterations is neglected or confusing by current mainstream gradient optimization algorithms. The work in this paper remedy this issue by introducing a novel unbiased stratified statistic \ $\bar{G}_{mst}$\ , a sufficient condition of fast convergence for \ $\bar{G}_{mst}$\ also is established. A novel algorithm named MSSG designed based on \ $\bar{G}_{mst}$\ outperforms other sgd-like algorithms. Theoretical conclusions and experimental evidence strongly suggest to employ MSSG when training deep model.
翻译:- 连续迭代之间更新参数引起的梯度预期和差异的波动效应被目前的主流梯度优化算法忽略或混淆。本文件的工作通过引入新的不带偏见的分类统计 \ $\ bar{ G ⁇ st} $\ 来纠正这一问题,这也是 $\ bar{ G ⁇ st} $ 快速趋同的充足条件。基于\ $\ bar{ G ⁇ st} $ 设计的名为 MSSG 的新的算法优于其他类似 sgd 的算法。理论结论和实验性证据表明,在培训深层模型时,可以使用MSSG 。