The state-of-the-art deep neural networks (DNNs) have significant computational and data management requirements. The size of both training data and models continue to increase. Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs. The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning. Both the feedforward (inference) and backpropagation steps in stochastic gradient descent (SGD) algorithm for training sparse DNNs involve consecutive sparse matrix-vector multiplications (SpMVs). We first introduce a distributed-memory parallel SpMV-based solution for the SGD algorithm to improve its scalability. The parallelization approach is based on row-wise partitioning of weight matrices that represent neuron connections between consecutive layers. We then propose a novel hypergraph model for partitioning weight matrices to reduce the total communication volume and ensure computational load-balance among processors. Experiments performed on sparse DNNs demonstrate that the proposed solution is highly efficient and scalable. By utilizing the proposed matrix partitioning scheme, the performance of our solution is further improved significantly.
翻译:先进的深心神经网络(DNNS)具有重大的计算和数据管理要求。培训数据和模型的规模都在继续扩大。分化和修剪方法在消除DNS的大量连接方面证明是有效的。由此形成的稀疏网络对进一步提高深层学习中培训和推论的计算效率提出了独特的挑战。在培养稀少的DNS的进化(推论)和反向回向回向回向转换算法中,它们都包含连续的稀少矩阵-矢量乘数(SpMVs)。我们首先为SGD算法引入了分布式和模子平行的SmMV-SmV-解决方案,以提高其可缩放性。平行化方法的基础是对代表连续层之间神经连接的重量矩阵进行分行式分隔。我们随后提出了一个新的超光谱模型,以降低总通信量并确保处理器之间的计算负载平衡。对稀多的DNS进行的实验表明,拟议的解决方案是高度高效和可扩缩的。通过拟议的矩阵分隔式配置方案。