Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in the prevention and treatment of virus-related diseases. However, the task of predicting protein-protein interactions between a new virus and human cells is extremely challenging due to scarce data on virus-human interactions and fast mutation rates of most viruses. We developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets. Instead of using hand-crafted protein features, we utilize statistically rich protein representations learned by a deep language modeling approach from a massive source of protein sequences. Additionally, we employ an additional objective which aims to maximize the probability of observing human protein-protein interactions. This additional task objective acts as a regularizer and also allows to incorporate domain knowledge to inform the virus-human protein-protein interaction prediction model. Our approach achieved competitive results on 13 benchmark datasets and the case study for the SAR-CoV-2 virus receptor. Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein-protein interaction prediction tasks. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/multitask-transfer.
翻译:了解特定病毒和人类蛋白质之间的相互作用模式在揭开病毒感染和病原体的基本机制方面发挥着关键的作用。这可以进一步帮助预防和治疗与病毒有关的疾病。然而,预测新病毒和人类细胞之间的蛋白质-蛋白相互作用的任务具有极大的挑战性,因为有关病毒-人类相互作用和大多数病毒快速突变率的数据稀少。我们开发了一个多任务传输学习方法,利用大约2,400万个蛋白序列的信息和人类互动模式来应对小型培训数据集问题。我们的方法不是使用手工制作的蛋白质特征,而是利用从大量蛋白序列来源的深语言建模方法所学到的具有丰富统计数据的蛋白质表现。此外,我们采用了另一个目标,目的是最大限度地提高观察人类蛋白-蛋白相互作用的可能性。这个额外任务目标作为常规化工具,还允许纳入域知识,为病毒-人类蛋白质-蛋白质互动预测模型和人类互动模式提供信息。我们的方法在13个基准数据设置上取得了竞争性结果,并且为人类- 实验-CO-RO-RV 共享的实验结果和人类-结果-结果-我们未来的分析-人类-病毒/病毒分析研究-结果-结果-结果-我们提议的重新计算-结果-病毒-预化-人类-病毒-结果-结果-病毒-分析-病毒-预化-分析-结果-分析-结果-病毒-分析-病毒-病毒-病毒-分析-分析-病毒-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-分析-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-人类-病毒-病毒-病毒-病毒-病毒-病毒-生物-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-生物-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒-病毒