Predicting the structure of multi-protein complexes is a grand challenge in biochemistry, with major implications for basic science and drug discovery. Computational structure prediction methods generally leverage pre-defined structural features to distinguish accurate structural models from less accurate ones. This raises the question of whether it is possible to learn characteristics of accurate models directly from atomic coordinates of protein complexes, with no prior assumptions. Here we introduce a machine learning method that learns directly from the 3D positions of all atoms to identify accurate models of protein complexes, without using any pre-computed physics-inspired or statistical terms. Our neural network architecture combines multiple ingredients that together enable end-to-end learning from molecular structures containing tens of thousands of atoms: a point-based representation of atoms, equivariance with respect to rotation and translation, local convolutions, and hierarchical subsampling operations. When used in combination with previously developed scoring functions, our network substantially improves the identification of accurate structural models among a large set of possible models. Our network can also be used to predict the accuracy of a given structural model in absolute terms. The architecture we present is readily applicable to other tasks involving learning on 3D structures of large atomic systems.
翻译:预测多蛋白综合体的结构是生物化学领域的一大挑战,对基础科学和药物发现具有重大影响。计算结构预测方法通常会利用预设的结构特征来区分准确的结构模型和不准确的结构模型。这就提出了一个问题,即是否可以直接从蛋白综合体原子坐标中直接学习准确模型的特征,而无需事先假设。这里我们引入了一种机器学习方法,直接从所有原子的三维位置直接学习蛋白综合体的准确模型,不使用任何事先计算过的物理激励或统计术语。我们的神经网络结构将多种要素结合起来,使包含数以万计原子的分子结构能够从分子结构中进行端到端学习:原子的点代表,在轮换和翻译、地方演进和等级子取样操作方面的差异。当我们与以前开发的评分功能结合使用时,我们的网络将大大改进在大量可能的模型中准确的结构模型的识别。我们的网络还可以用来预测一个特定的结构模型的准确性,在绝对值结构结构结构上,我们现有的结构可以随时学习其他系统。