Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of the AlphaFold model are challenging due to its high computation and memory cost. In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference. We propose Dynamic Axial Parallelism and Duality Async Operations to improve the scaling efficiency of model parallelism. Besides, AutoChunk is proposed to reduce memory cost by over 80% during inference by automatically determining the chunk strategy. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5X - 9.5X speedup for long-sequence inference. Furthermore, we scale FastFold to 512 GPUs and achieve an aggregate throughput of 6.02 PetaFLOP/s with 90.1% parallel efficiency.
翻译:蛋白质结构预测有助于理解基因翻译和蛋白质功能,这对结构生物学越来越感兴趣,也越来越重要。阿尔法福德模型使用变压器结构来实现蛋白质结构预测中的原子级精确度,这是一个重大突破。然而,阿尔法福德模型的培训和推论因其高计算和记忆成本而具有挑战性。在这项工作中,我们介绍了FastFold,一个高效实施阿尔法Fold,用于培训和推断;我们提议动态轴平行和Dlegal Async操作,以提高模型平行的效率。此外,AutoChunk提议通过自动确定块战略,在推断过程中将内存成本减少80%以上。实验结果表明,FastFold将总体培训时间从11天减少到67小时,并实现了7.5X-9.5X速度,用于长期序列推断。此外,我们将快速Fold提高到512 GPPUPS, 并实现6.02 PetaFLOP/s的总吞吐量,同时达到90.1 %。