We study stochastic algorithms in a streaming framework, trained on samples coming from a dependent data source. In this streaming framework, we analyze the convergence of Stochastic Gradient (SG) methods in a non-asymptotic manner; this includes various SG methods such as the well-known stochastic gradient descent (i.e., Robbins-Monro algorithm), mini-batch SG methods, together with their averaged estimates (i.e., Polyak-Ruppert averaged). Our results form a heuristic by linking the level of dependency and convexity to the rest of the model parameters. This heuristic provides new insights into choosing the optimal learning rate, which can help increase the stability of SGbased methods; these investigations suggest large streaming batches with slow decaying learning rates for highly dependent data sources.
翻译:我们在一个流流框架内研究随机算法,对来自依赖性数据源的样本进行了培训。在这个流流框架内,我们以非无症状的方式分析Stochatic Gradient(SG)方法的趋同情况;这包括各种SG方法,如众所周知的随机梯度梯度梯度梯度梯度(Robbins-Monro算法)、小型批量SG方法,以及平均估计(即Polyak-Ruppert平均数)。我们的结果通过将依赖性和共性水平与模型参数的其余部分联系起来,形成了一种超常。这种超常性为选择最佳学习率提供了新的洞见,有助于增加基于SG方法的稳定性;这些调查表明,对于高度依赖性的数据源而言,大量流学率缓慢,学习速度缓慢。