How can we design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that PiFold could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, PiFold achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement.
翻译:为了解决这些问题,我们建议PiFold公司(PiFold公司,其中含有一种新型残渣发酵器和PiGNN层)以一线之力生成蛋白序列,并改进恢复。实验显示PiFold公司可以在CATH4.2上实现51.66 ⁇ 的恢复,而推断速度比自动递进竞争者快70倍。此外,PiFold公司在TS50和TS500上分别取得了58.72 ⁇ 和60.42 ⁇ 的恢复分数。我们进行了全面化研究,以揭示不同种类蛋白特征和模型设计的作用,激励进一步的简化和改进。