Protein structure prediction is a critical problem linked to drug design, mutation detection, and protein synthesis, among other applications. To this end, evolutionary data has been used to build contact maps which are traditionally minimized as energy functions via gradient descent based schemes like the L-BFGS algorithm. In this paper we present what we call the Alternating Metropolis-Hastings (AMH) algorithm, which (a) significantly improves the performance of traditional MCMC methods, (b) is inherently parallelizable allowing significant hardware acceleration using GPU, and (c) can be integrated with the L-BFGS algorithm to improve its performance. The algorithm shows an improvement in energy of found structures of 8.17% to 61.04% (average 38.9%) over traditional MH and 0.53% to 17.75% (average 8.9%) over traditional MH with intermittent noisy restarts, tested across 9 proteins from recent CASP competitions. We go on to map the Alternating MH algorithm to a GPGPU which improves sampling rate by 277x and improves simulation time to a low energy protein prediction by 7.5x to 26.5x over CPU. We show that our approach can be incorporated into state-of-the-art protein prediction pipelines by applying it to both trRosetta2's energy function and the distogram component of Alphafold1's energy function. Finally, we note that specially designed probabilistic computers (or p-computers) can provide even better performance than GPUs for MCMC algorithms like the one discussed here.
翻译:蛋白质结构预测是一个与药物设计、突变检测和蛋白合成等应用相关的关键问题。 为此,已经使用进化数据来绘制接触图,传统上,通过L-BFGS算法等基于梯度的梯度下沉计划,这些联系图作为能源功能被最小化。 在本文中,我们展示了所谓的“交替大都会-哈斯廷(AMH)算法 ”, (a) 大大改善了传统MCMC方法的性能, (b) 具有内在的平行功能,允许使用GPGPU大大加快硬件速度, (c) 可以与L-BFGS算法结合,以提高其性能。 算法显示发现的结构在8.17至60.04%(平均38.9%)上比传统MH和0.53%至17.75%(平均8.9%)的能量功能有所改进,在近期CASP竞赛的9种蛋白质上进行了测试。 我们在这里将解算法的MH算法比GPGPPPPPPP(通过277x来提高取样率,并改进模拟时间到较低能量蛋白值预测, 由7.x至26.5x 。