The fluctuation effect of gradient expectation and variance caused by parameter update between consecutive iterations is neglected or confusing by current mainstream gradient optimization algorithms.Using this fluctuation effect, combined with the stratified sampling strategy, this paper designs a novel \underline{M}emory \underline{S}tochastic s\underline{T}ratified Gradient Descend(\underline{MST}GD) algorithm with an exponential convergence rate. Specifically, MSTGD uses two strategies for variance reduction: the first strategy is to perform variance reduction according to the proportion p of used historical gradient, which is estimated from the mean and variance of sample gradients before and after iteration, and the other strategy is stratified sampling by category. The statistic \ $\bar{G}_{mst}$\ designed under these two strategies can be adaptively unbiased, and its variance decays at a geometric rate. This enables MSTGD based on $\bar{G}_{mst}$ to obtain an exponential convergence rate of the form $\lambda^{2(k-k_0)}$($\lambda\in (0,1)$,k is the number of iteration steps,$\lambda$ is a variable related to proportion p).Unlike most other algorithms that claim to achieve an exponential convergence rate, the convergence rate is independent of parameters such as dataset size N, batch size n, etc., and can be achieved at a constant step size.Theoretical and experimental results show the effectiveness of MSTGD
翻译:连续迭代之间的参数更新导致的梯度期望和差异的波动效应被当前主流梯度优化算法忽略或混淆。 使用这种波动效应,加上分层取样策略,本文设计了一个新的新颖的下线( 下线) 和下线( S} 下线) 和下线 { T} 批准的渐变调调降( 下线 { MST} GD) 算法, 以指数趋同率 。 具体地说, MSTGD 使用两种战略来减少差异: 第一个战略是根据使用的历史梯度的 p 比例来减少差异, 该比例是根据迭代前后样本梯度的平均值和差异估算的。 在这两种战略下设计的统计 \ $\ bar{ G\\\\\ 下线 下线 线 和 线性调调调调调调调调调调 。 以 $\ lam_ k_ k_ 节调和 leg_ 美元 的递增调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调