Nuclear Magnetic Resonance (NMR) spectroscopy is one of the major techniques in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. Here, we present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without any human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 {\AA} median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.
翻译:核磁共振成像(NMR)光谱是结构生物学中的主要技术之一,其结构生物学中存放在蛋白质数据库中的蛋白结构超过11 800个蛋白结构的11 800个。NMR可以对中小蛋白结构的结构和动态进行解析、活细胞和固体,但受到繁琐的数据分析过程的限制。通常需要经过培训的专家进行数周或数月的手工工作,才能将NMR测量结果转化为蛋白质结构。这个过程的自动化是一个30多年前在实地开发的公开问题。在这里,我们提出了一个解决挑战的办法,使蛋白蛋白NMR数据能够在完成测量后数小时内进行完全自动化分析。只使用NMR光谱和蛋白序列作为投入,我们的机器学习方法,ARTINA,提供信号位置、共振动任务和结构,而没有任何人类干预。根据由1329个多层面NMR光谱系统构成的100-蛋白基准,ARTINA展示了它用1.44中位来解决结构问题的能力。RMRD可以向PD引用并查明91.36%的蛋白质数据,并基本用NMRMRMS测量任务进行不进行正确的分配。