蛋白质相互作用位点预测的增强共进化全局注意力神经网络CoGANPPIS (CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction)

Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.

翻译：蛋白质相互作用对生化过程至关重要。精确预测蛋白质相互作用位点（PPIs）有助于加深我们对生物机制的理解，并对新药的设计至关重要。然而，目前大多数的预测方法都需要昂贵而耗时的实验方法，因此，许多基于机器学习的方法已近年来得到了发展。尽管这些方法已经取得了令人满意的结果，但仍存在两个限制：（1）大多数模型已经挖掘出了一些有用的输入特征，但未能考虑到共同进化特征，这些特征可以为残基间关系提供线索；（2）基于注意力机制的模型只为相邻的残基分配权重，而不是全局的，忽略了一些远离目标残基的残基也可能很重要。我们提出了一种增强共进化全局注意力神经网络，一种用于蛋白质相互作用位点预测的基于序列的深度学习模型，称为CoGANPPIS。这种模型运用了三个并行的层进行特征提取：（1）局部级表示聚合层，聚合相邻残基的特征。（2）全局级表示学习层，使用新颖的增强共进化全局注意力机制为同一蛋白质序列上的所有残基分配注意权重。（3）共同进化信息学习层，应用CNN和池化来获得共同进化剖面表示。然后，将三个输出连接并传入几个全连接层进行最终预测。通过对两个基准数据集的应用，证明了我们模型的最先进性能。源代码公开在https://github.com/Slam1423/CoGANPPIS_source_code中。