Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions and large datasets. We address the bottleneck problem arising when using both shared and distributed memory. Typically, the former is bounded by limited computation resources and bandwidth whereas the latter suffers from communication overheads. We propose a unified distributed and parallel implementation of SGD (named DPSGD) that relies on both asynchronous distribution and lock-free parallelism. By combining two strategies into a unified framework, DPSGD is able to strike a better trade-off between local computation and communication. The convergence properties of DPSGD are studied for non-convex problems such as those arising in statistical modelling and machine learning. Our theoretical analysis shows that DPSGD leads to speed-up with respect to the number of cores and number of workers while guaranteeing an asymptotic convergence rate of $O(1/\sqrt{T})$ given that the number of cores is bounded by $T^{1/4}$ and the number of workers is bounded by $T^{1/2}$ where $T$ is the number of iterations. The potential gains that can be achieved by DPSGD are demonstrated empirically on a stochastic variational inference problem (Latent Dirichlet Allocation) and on a deep reinforcement learning (DRL) problem (advantage actor critic - A2C) resulting in two algorithms: DPSVI and HSA2C. Empirical results validate our theoretical findings. Comparative studies are conducted to show the performance of the proposed DPSGD against the state-of-the-art DRL algorithms.
翻译:沙变梯度下降(SGD) 是一种广泛采用的优化不同客观功能的迭接方法(SGD) 。 在本文中, 我们提出并讨论一种新颖的方法, 以在涉及非 convex 函数和大型数据集的应用中扩大 SGD 。 我们处理在使用共享和分布记忆时产生的瓶颈问题。 通常, 前者受有限的计算资源和带宽的约束, 而后者则受通信管理的影响。 我们提议统一分布和平行地实施 SGD( 名为DPSGD ), 既依靠非同步的分布,又依靠无锁定的平行平行平行功能。 通过将两个战略合并到一个统一的框架中, DPSGD 能够在本地计算和通信之间实现更好的交易。 DPSGD的趋同特性是非连接的问题, 如统计建模和机学习中出现的问题。 我们的理论分析显示, DPSGDD 和DR4 的计算结果可以显示, $SAL 和 Excialal 的计算结果是由 $Sqral 。