Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented by Jax) and OpenFold (implemented by PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold's code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast.
翻译:直线蛋白结构预测可以大大加快生命科学的发展。 阿尔法Fold2的准确性已经接近实验确定技术的精确性。 由于复杂的模型架构和大量内存消耗, 它需要大量的计算资源和时间来从头开始实施阿尔法Fold2的培训和推断。 运行原始阿尔法Fold2 的成本对大多数个人和机构来说是昂贵的。 因此, 降低这一成本可以加速生命科学的发展。 我们使用 草盘HoldPaddle( 即 HelixFold) 执行阿尔法Fold2, 以提高培训和感化速度, 并减少记忆消耗。 由于操作者集成、 聚合和混合平行计算, 需要大量的计算资源和时间来实施阿尔法Fold2 。 将原始阿尔法Fold2 (由Jax实施) 和 OpenFold Fold 2 (由PyTorch 实施), HelixFold Fold 只需要7. 7天来完成全端/ Exfrefold hold 训练, 并且只用 IM Herdal dreal dreal dreal_Hold viewd dreal dreal dreal dreald dald sal.