Protein-ligand interactions (PLIs) are fundamental to biochemical research and their identification is crucial for estimating biophysical and biochemical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures, GNNF is the base implementation that employs distinct featurization to enhance domain-awareness, while GNNP is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and proteins 3D structure with 0.979 test accuracy for GNNF and 0.958 for GNNP for predicting activity of a protein-ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and pIC50 is crucial for drugs potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on pIC50 with GNNF and GNNP, respectively, outperforming similar 2D sequence-based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of GNNP on SARS-Cov-2 protein targets by screening a large compound library and comparing our prediction with the experimentally measured data.
翻译:对生化研究来说,蛋白质和相互作用(PLI)是生化学研究的基础,其识别对于估计生物物理和生化特性以合理治疗设计而言至关重要。目前,对这些属性的实验性定性是最精确的方法,然而,这是非常耗时和劳动密集型的方法。在此背景下已经开发出一些计算方法,但现有的PLI预测大多取决于2D蛋白序列数据。在这里,我们展示了一个新的平行的平行图形神经网络(GNNN),以整合PLI预测的知识体现和推理,以在专家知识指导下进行深度学习,并以3D结构数据为根据。我们开发了两种不同的GNNN结构,而人工合成的GNNF是使用独特的变速法来提高域内意识的基础,而GNP是全新的方法。GNF1 蛋白质和蛋白质3D结构(GNNF的测试精度准确度和0.958 GNP,用于预测蛋白质和复杂物质的可耗度活动。这些模型进一步调整了INNF的实验性模型, 和实验性GNF5的精度,我们用来预测了基值的精度的精度和精确度,我们实验室的数值的精度的精度和温度的精确度,我们为实验性能的数值的精确度的数值的精确度和温度的精确度,我们之间的精确度, 和精确度,我们之间的精确度,我们制的基值。