FastFold:将 AlphaFold 培训时间从11天减至67小时 (FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours)

Protein structure prediction is an important method for understanding gene translation and protein function in the domain of structural biology. AlphaFold introduced the Transformer model to the field of protein structure prediction with atomic accuracy. However, training and inference of the AlphaFold model are time-consuming and expensive because of the special performance characteristics and huge memory consumption. In this paper, we propose FastFold, a highly efficient implementation of the protein structure prediction model for training and inference. FastFold includes a series of GPU optimizations based on a thorough analysis of AlphaFold's performance. Meanwhile, with Dynamic Axial Parallelism and Duality Async Operation, FastFold achieves high model parallelism scaling efficiency, surpassing existing popular model parallelism techniques. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5-9.5X speedup for long-sequence inference. Furthermore, We scaled FastFold to 512 GPUs and achieved an aggregate of 6.02 PetaFLOPs with 90.1% parallel efficiency. The implementation can be found at https://github.com/hpcaitech/FastFold

翻译：蛋白质结构预测是结构生物学领域理解基因翻译和蛋白质功能的一个重要方法。 AlphaFold 将变异器模型引入蛋白质结构预测领域,并使用原子精度。然而,阿尔法Fold 模型的培训和推断耗时费时,而且由于性能特点和记忆消耗量巨大,因此成本昂贵。我们在此文件中提议,FastFold,高效地实施蛋白质结构预测模型,用于培训和推断。FastFold 包括一系列基于对阿尔法Fold 性能的透彻分析的GPU优化。同时,随着动态轴平行和品质Async 操作,FastFold 实现了高模型平行性标的比重效率,超过了现有的流行模型平行技术。实验结果表明,FastFold 将总体培训时间从11天缩短到67小时,并实现了7.5-9.5X 用于长期序列推断的加速度。此外,我们将快速Fold 提高到512 GPUPUP, 并实现了6.02 PetaFLOPs, 和90.1%的平行效率。在 http://Fold/FIT 中可以找到。