The linear sequence of amino acids determines protein structure and function. Protein design, known as the inverse of protein structure prediction, aims to obtain a novel protein sequence that will fold into the defined structure. Recent works on computational protein design have studied designing sequences for the desired backbone structure with local positional information and achieved competitive performance. However, similar local environments in different backbone structures may result in different amino acids, indicating that protein structure's global context matters. Thus, we propose the Global-Context Aware generative de novo protein design method (GCA), consisting of local and global modules. While local modules focus on relationships between neighbor amino acids, global modules explicitly capture non-local contexts. Experimental results demonstrate that the proposed GCA method outperforms state-of-the-arts on de novo protein design. Our code and pretrained model will be released.
翻译:氨基酸的线性序列决定了蛋白质结构和功能。 蛋白质设计(称为蛋白质结构的逆向预测)旨在获得新的蛋白序列,该序列将折叠在确定的结构中。 最近关于计算蛋白设计的工作利用当地定位信息,研究了理想骨干结构的序列设计,并取得了竞争性性能。然而,不同骨干结构中类似的当地环境可能导致不同的氨基酸,表明蛋白质结构的全球背景问题。因此,我们提议采用由当地和全球模块组成的全球了解基因异质设计方法(GCA)。虽然当地模块侧重于邻近氨基酸之间的关系,但全球模块明确捕捉非当地环境。实验结果显示,拟议的GCA方法优于新蛋白设计方面的状态。我们的代码和预先训练模型将被释放。