蛋白质-蛋白质相互作用位点预测的进化增强全局注意力神经网络CoGANPPIS (CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction)

Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.

翻译：摘要：蛋白质-蛋白质相互作用是生物化学过程中不可或缺的一部分。精准预测蛋白质-蛋白质相互作用位点（PPIs）深化了我们对生物机制的理解，并且对新药设计至关重要。然而，常规的PPIs预测实验方法成本高昂且耗时，在近期已经开发出了许多计算方法，尤其是基于机器学习的方法。虽然这些方法已经取得了令人满意的结果，但仍存在两个限制：（1）大多数模型已经挖掘出一些有用的输入特征，但未考虑到共同进化特征，这些特征可以提供介于残基之间的关系线索；（2）基于注意力机制的模型只为邻近残基分配注意权重，而不是在全局上分配，忽略了一些远离目标残基的残基也可能很重要。本文提出了一种进化增强的全局注意力神经网络，一种用于PPIs预测的基于序列的深度学习模型，称为CoGANPPIS。它使用三个并行的层进行特征提取：（1）局部级别表示聚合层，它聚合了邻近残基的特征；（2）全局级别表示学习层，它采用新颖的进化增强全局注意机制，为同一蛋白质序列上的所有残基分配注意权重；（3）进化信息学习层，它对进化信息应用CNN和池化，以获得进化概况表示。然后，三个输出被串联并传递到几个全连接层进行最终预测。在两个基准数据集上的应用展示了我们模型的最先进性能。源代码公开在https://github.com/Slam1423/CoGANPPIS_source_code中。