Protein folding neural networks (PFNNs) such as AlphaFold predict remarkably accurate structures of proteins compared to other approaches. However, the robustness of such networks has heretofore not been explored. This is particularly relevant given the broad social implications of such technologies and the fact that biologically small perturbations in the protein sequence do not generally lead to drastic changes in the protein structure. In this paper, we demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy. This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted. To measure the robustness of the predicted structures, we utilize (i) the root-mean-square deviation (RMSD) and (ii) the Global Distance Test (GDT) similarity measure between the predicted structure of the original sequence and the structure of its adversarially perturbed version. We prove that the problem of minimally perturbing protein sequences to fool protein folding neural networks is NP-complete. Based on the well-established BLOSUM62 sequence alignment scoring matrix, we generate adversarial protein sequences and show that the RMSD between the predicted protein structure and the structure of the original sequence are very large when the adversarial changes are bounded by (i) 20 units in the BLOSUM62 distance, and (ii) five residues (out of hundreds or thousands of residues) in the given protein sequence. In our experimental evaluation, we consider 111 COVID-19 proteins in the Universal Protein resource (UniProt), a central resource for protein data managed by the European Bioinformatics Institute, Swiss Institute of Bioinformatics, and the US Protein Information Resource. These result in an overall GDT similarity test score average of around 34%, demonstrating a substantial drop in the performance of AlphaFold.
翻译:蛋白质折叠神经网络(PFNN),如AlphaFold等蛋白质折叠神经网络(PFNN)预测的蛋白质结构与其他方法相比非常精确。然而,这些网络的坚固性尚未探索。鉴于这些技术的广泛社会影响,而且蛋白质序列的生物小扰动通常不会导致蛋白质结构的急剧变化。在本文件中,我们证明AlphaFold尽管精确度很高,但没有表现出如此强健的蛋白质网络。这提出了检测和量化这些预测的蛋白质结构在多大程度上可以被信任的挑战。为了测量预测结构的坚固性,我们使用了(一) GOMD 直径偏差(RMSD) 和 (二) 全球远距测试(GDTT) 的直径(GDTT) 测量结果, 最初序列的蛋白质折叠合蛋白质网络(ProF) 质网络中最小的蛋白质序列问题是NP-完整的。根据我们确定的 BLOS62 排序对比矩阵对比矩阵矩阵对比,我们用SD 的直径直径直径直径对20的蛋白质蛋白序列进行预测,在基质蛋白质蛋白质序列中,我们对20个蛋白质序列中,我们用资源测测序结构中,我们用资源测算算算算法的蛋白质蛋白质序列结构结构中,而显示的蛋白质序列结构。