In the field of antibody engineering, an essential task is to design a novel antibody whose paratopes bind to a specific antigen with correct epitopes. Understanding antibody structure and its paratope can facilitate a mechanistic understanding of its function. Therefore, antibody structure prediction from its sequence alone has always been a highly valuable problem for de novo antibody design. AlphaFold2, a breakthrough in the field of structural biology, provides a solution to predict protein structure based on protein sequences and computationally expensive coevolutionary multiple sequence alignments (MSAs). However, the computational efficiency and undesirable prediction accuracy of antibodies, especially on the complementarity-determining regions (CDRs) of antibodies limit their applications in the industrially high-throughput drug design. To learn an informative representation of antibodies, we employed a deep antibody language model (ALM) on curated sequences from the observed antibody space database via a transformer model. We also developed a novel model named xTrimoABFold to predict antibody structure from antibody sequence based on the pretrained ALM as well as efficient evoformers and structural modules. The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss. xTrimoABFold outperforms AlphaFold2 and other protein language model based SOTAs, e.g., OmegaFold, HelixFold-Single, and IgFold with a large significant margin (30+\% improvement on RMSD) while performing 151 times faster than AlphaFold2. To the best of our knowledge, xTrimoABFold achieved state-of-the-art antibody structure prediction. Its improvement in both accuracy and efficiency makes it a valuable tool for de novo antibody design and could make further improvements in immuno-theory.
翻译:在抗体工程领域,一项基本任务是设计一种新型抗体,其假形与特定抗原结合,并具有正确的顶部。了解抗体结构及其副作用可以促进对其功能的机械化理解。因此,单凭其序列对抗体结构的预测,就始终是一个非常宝贵的问题,对于新抗体设计而言,这是一个非常宝贵的问题。AphaFold2是结构性生物学领域的一个突破,它提供了一个基于蛋白序列和计算成本昂贵的共生多个序列对蛋白结构进行预测的解决方案。然而,抗体的计算效率和不理想预测精确度,特别是互补确定区域(CDRD)的计算效率和不可取性预测准确性,抗体的计算效率和抗体预测准确性能限制了其在工业高通量药物设计中的应用。为了了解抗体的信息,我们在观察到的抗体空间数据库的曲解序列上采用了一种深度抗体语言模型(ALM),我们还开发了名为XTrimooAFFold的抗体改进模型,从前的抗体序列上预测了抗体改进。