Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.
翻译:大型变压器在许多任务中取得了最先进的业绩,大多数关于扩大变压器的开放源码图书馆侧重于改进培训或推论,并更好地平行。在这项工作中,我们介绍了火炬系统,这是一个开放源码工具包,使研究人员和开发商能够高效率和有成效地扩大变压器。火炬系统采用了若干建模技术,这些技术可以改进通用性和能力的建模,以及培训稳定性和效率。语言建模和神经机转换的实验结果表明,火炬系统可以成功地将变压器的规模扩大到不同大小,而没有眼泪。图书馆可在https://akas.ms/torchscale上查阅。