Deep Neural Network (DNN) models are usually trained sequentially from one layer to another, which causes forward, backward and update locking's problems, leading to poor performance in terms of training time. The existing parallel strategies to mitigate these problems provide suboptimal runtime performance. In this work, we have proposed a novel layer-wise partitioning and merging, forward and backward pass parallel framework to provide better training performance. The novelty of the proposed work consists of 1) a layer-wise partition and merging model which can minimise communication overhead between devices without the memory cost of existing strategies during the training process; 2) a forward pass and backward pass parallelisation and optimisation to address the update locking problem and minimise the total training cost. The experimental evaluation on real use cases shows that the proposed method outperforms the state-of-the-art approaches in terms of training speed; and achieves almost linear speedup without compromising the accuracy performance of the non-parallel approach.
翻译:深神经网络(DNN)模式通常是从一个层次到另一个层次的连续培训,这导致前向、后向和更新锁定问题,导致培训时间方面的业绩不佳;现有的缓解这些问题的平行战略提供不理想的运行时间表现;在这项工作中,我们提议了一个新的分层和合并、前向和后向平行框架,以提供更好的培训业绩;拟议工作的新颖性包括:(1) 一个分层和合并模式,可以将各装置之间的通信间接费用减少到最低程度,而无需培训过程中现有战略的记忆成本;(2) 一个前向和后向平行化和优化,以解决更新锁定问题,并尽量减少总培训费用;对实际使用案例的试验性评价表明,拟议的方法在培训速度方面超越了最先进的方法;在不降低非平行方法的准确性能的情况下实现几乎线性加速。