Machine learning (ML) is revolutionizing protein structural analysis, including an important subproblem of predicting protein residue contact maps, i.e., which amino-acid residues are in close spatial proximity given the amino-acid sequence of a protein. Despite recent progresses in ML-based protein contact prediction, predicting contacts with a wide range of distances (commonly classified into short-, medium- and long-range contacts) remains a challenge. Here, we propose a multiscale graph neural network (GNN) based approach taking a cue from multiscale physics simulations, in which a standard pipeline involving a recurrent neural network (RNN) is augmented with three GNNs to refine predictive capability for short-, medium- and long-range residue contacts, respectively. Test results on the ProteinNet dataset show improved accuracy for contacts of all ranges using the proposed multiscale RNN+GNN approach over the conventional approach, including the most challenging case of long-range contact prediction.
翻译:机器学习(ML)正在使蛋白质结构分析革命化,包括一个重要的预测蛋白质残留接触图的次级问题,即由于蛋白的氨基酸序列,氨基酸残留物在空间上接近一个蛋白质的序列。尽管最近在基于ML的蛋白接触预测方面取得了进展,但预测与各种距离(通常分为短、中、长接触)的接触仍是一个挑战。在这里,我们提议采用多尺度的物理模拟信号,以多尺度的图形神经网络为基础,采用多尺度的神经网络(GNN)方法,其中涉及一个经常性神经网络(RNN)的标准管道增加三个GNN,以完善短、中、长距离残留接触的预测能力。关于ProteinNet数据集的测试结果显示,使用拟议的多尺度的RNN+GNNN方法,包括最具有挑战性的长距离接触预测,提高了所有距离接触的准确性。