We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines -- e.g., modifications of variance-reduced gradient algorithms like SAGA work well -- little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with $m$ machines, under a natural stochastic delay model with an mean delay of $m$, ADSAGA converges in $\tilde{O}\left(\left(n + \sqrt{m}\kappa\right)\log(1/\epsilon)\right)$ iterations, where $n$ is the number of component functions, and $\kappa$ is a condition number. This complexity sits squarely between the complexity $\tilde{O}\left(\left(n + \kappa\right)\log(1/\epsilon)\right)$ of SAGA \textit{without delays} and the complexity $\tilde{O}\left(\left(n + m\kappa\right)\log(1/\epsilon)\right)$ of parallel asynchronous algorithms where the delays are \textit{arbitrary} (but bounded by $O(m)$), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in $\tilde{O}(n^2\kappa\log(1/\epsilon))$ iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches.
翻译:在分布式数据设置中,我们用一个中央参数服务器来研究是否在分布式数据设置中将数据限制到最小值 。虽然在平行设置中,所有机器都可以访问到数据,例如,修改差异降梯度算法(如SAGA工作良好),但在分布式数据设置中却鲜为人知。我们开发了一个基于SAGA的分布式数据设置的ADSAGA算法,其中数据在多个机器之间分配。我们显示,如果使用美元(美元)的机器,在可读性延迟模型下,以美元为平均值,ADSAGA在 $(n) liverdealde;Oleft (n)\\\\ tright{Oright} daldical-rickrickrickrlickrick} 中,美元是美元数数是元数数,而$(kaptappal-rickr=x) 和Sal-ral-ral-ral-rent-ral-ral-ral-ral-sal-smas disal-s disl) 。