The study of multi-type Protein-Protein Interaction (PPI) is fundamental for understanding biological processes from a systematic perspective and revealing disease mechanisms. Existing methods suffer from significant performance degradation when tested in unseen dataset. In this paper, we investigate the problem and find that it is mainly attributed to the poor performance for inter-novel-protein interaction prediction. However, current evaluations overlook the inter-novel-protein interactions, and thus fail to give an instructive assessment. As a result, we propose to address the problem from both the evaluation and the methodology. Firstly, we design a new evaluation framework that fully respects the inter-novel-protein interactions and gives consistent assessment across datasets. Secondly, we argue that correlations between proteins must provide useful information for analysis of novel proteins, and based on this, we propose a graph neural network based method (GNN-PPI) for better inter-novel-protein interaction prediction. Experimental results on real-world datasets of different scales demonstrate that GNN-PPI significantly outperforms state-of-the-art PPI prediction methods, especially for the inter-novel-protein interaction prediction.
翻译:多类型蛋白质-蛋白质-蛋白质相互作用(PPI)的研究对于从系统角度了解生物过程和揭示疾病机制至关重要。现有方法在通过无形数据集测试时出现显著的性能退化。在本文中,我们调查了这一问题,发现主要归因于新蛋白-蛋白互动预测的性能不佳。然而,目前的评价忽略了新蛋白-蛋白-蛋白相互作用,因此没有作出有启发性的评估。结果,我们提议从评估和方法两方面解决这个问题。首先,我们设计一个新的评价框架,充分尊重新蛋白-蛋白相互作用,并对各数据集作出一致的评估。第二,我们认为蛋白质之间的相互关系必须提供有用的信息,用于分析新蛋白质,在此基础上,我们提议一个基于图表神经网络的方法(GNNN-PPI),以更好地进行新蛋白-蛋白互动预测。不同规模的现实世界数据集的实验结果显示,GNN-PPIPI明显超越PI预测方法的状态,特别是用于区域间预测。