Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its natural behavior of sequential optimization of the error function. This has led to the development of parallel SGD algorithms, such as asynchronous SGD (ASGD) and synchronous SGD (SSGD) to train deep neural networks. However, it introduces a high variance due to the delay in parameter (weight) update. We address this delay in our proposed algorithm and try to minimize its impact. We employed guided SGD (gSGD) that encourages consistent examples to steer the convergence by compensating the unpredictable deviation caused by the delay. Its convergence rate is also similar to A/SSGD, however, some additional (parallel) processing is required to compensate for the delay. The experimental results demonstrate that our proposed approach has been able to mitigate the impact of delay for the quality of classification accuracy. The guided approach with SSGD clearly outperforms sequential SGD and even achieves the accuracy close to sequential SGD for some benchmark datasets.
翻译:由于大数据迅速增长和深层次学习,SGD不再是最合适的选择,因为其自然行为是连续优化错误函数的自然行为。这导致了平行SGD算法的开发,例如,无同步的SGD(ASGD)和同步的SGD(SSGD)等,用于培训深神经网络。然而,由于参数(重量)更新的延迟,它带来了很大的差异。我们解决了拟议算法中的这种延迟,并试图尽量减少其影响。我们采用了引导SGD(gSGD),鼓励以一致的例子指导趋同,补偿延迟造成的不可预测的偏差。然而,它的趋同率与A/SSGD类似,还需要一些额外的(平行)处理来弥补延迟。实验结果表明,我们拟议的方法已经能够减轻延迟分类准确性质量的影响。以SSGD(SGD)为指南的方法明显不符合连续的SGD,甚至接近某些基准数据集的序列精确度。