Several low-bandwidth distributable black-box optimization algorithms have recently been shown to perform nearly as well as more refined modern methods in some Deep Reinforcement Learning domains. In this work we investigate a core problem with the use of distributed workers in such systems. Further, we investigate the dramatic differences in performance between the popular Adam gradient descent algorithm and the simplest form of stochastic gradient descent. These investigations produce a stable, low-bandwidth learning algorithm that achieves 100\% usage of all connected CPUs under typical conditions.
翻译:最近一些低带宽分配式黑盒优化算法在某些深强化学习领域表现出近乎更完善的现代方法。 在这项工作中,我们调查了在这类系统中使用分布式工人的核心问题。此外,我们调查了流行的亚当梯度梯度下降算法与最简单形式的随机梯度梯度下降之间的巨大性能差异。这些调查产生了一种稳定、低带宽的学习算法,在典型条件下实现了使用所有连接的CPU100 ⁇ 。