Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker's task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.
翻译:代码分配计算已成为在大型数据集上进行梯度下降以减缓分流器和其他缺陷的常见做法。 本文提出一种新的算法,将部分衍生物本身编码,并进一步优化代码,对衍生物编码进行损失压缩,通过最大限度地增加编码词中所含信息,同时最大限度地减少编码词中的信息。 这种编码理论应用的效用是最佳化研究中观察到的以下事实的几何结果:噪音是可容忍的,有时甚至有用,在基于梯度的学习算法中进行,因为它有助于避免过度装配和本地迷你。 这与目前关于分布式编码计算的许多常规工作形成对照,后者的重点是从工人那里收回所有数据。 另一项贡献是,由于编码可以迭代解,编码理论的低重量性梯度更新是允许的;即工人的任务可以立即更新到更大的梯度。 方向衍生物始终是方向矢量的线性功能; 因此,我们的框架是健全的,因为它可以将线性编码技术应用到一般机器的深部学习框架。