COANPPIS: 共同革命促进全球注意网络蛋白因-蛋白质-蛋白因相互作用地点预测神经网络</s> (CoGANPPIS: Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction)

Protein-protein interactions are essential in biochemical processes. Accurate prediction of the protein-protein interaction sites (PPIs) deepens our understanding of biological mechanism and is crucial for new drug design. However, conventional experimental methods for PPIs prediction are costly and time-consuming so that many computational approaches, especially ML-based methods, have been developed recently. Although these approaches have achieved gratifying results, there are still two limitations: (1) Most models have excavated some useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, neglecting that some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. It utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all the residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Application on two benchmark datasets demonstrated a state-of-the-art performance of our model. The source code is publicly available at https://github.com/Slam1423/CoGANPPIS_source_code.

翻译：蛋白质和蛋白质的相互作用对于生物化学过程至关重要。对蛋白质和蛋白质互动点的准确预测加深了我们对生物机制的理解,对于新的药物设计至关重要。然而,对蛋白质和蛋白质预测的常规实验方法成本昂贵且耗时,因此最近制定了许多计算方法,特别是以ML为基础的方法。虽然这些方法取得了令人满意的结果,但仍有两个局限性:(1) 多数模型挖掘了一些有用的输入特征,但未能将进化应用考虑在内,这可以为回溯关系提供线索;(2) 基于关注的模型只为邻近的残留物充分分配关注权重,而不是在全球范围这样做,忽视某些残留物远离目标残留物的残留物也可能很重要。我们提议建立一个具有共进化的全球关注神经网络,一个基于序列的深入学习模型,称为CGANPPIS。它利用三个平行的地谱提取层:(1) 本地级代表组层,该层将近距离残留物的特性汇总;(2) 全球级代表级代表级代表系统展示了我们的最新数据序列,该级的注意力将全球级数据排序用于同一的连续数据源。</s>