Computational methods that operate on three-dimensional molecular structure have the potential to solve important questions in biology and chemistry. In particular, deep neural networks have gained significant attention, but their widespread adoption in the biomolecular domain has been limited by a lack of either systematic performance benchmarks or a unified toolkit for interacting with molecular data. To address this, we present ATOM3D, a collection of both novel and existing benchmark datasets spanning several key classes of biomolecules. We implement several classes of three-dimensional molecular learning methods for each of these tasks and show that they consistently improve performance relative to methods based on one- and two-dimensional representations. The specific choice of architecture proves to be critical for performance, with three-dimensional convolutional networks excelling at tasks involving complex geometries, graph networks performing well on systems requiring detailed positional information, and the more recently developed equivariant networks showing significant promise. Our results indicate that many molecular problems stand to gain from three-dimensional molecular learning, and that there is potential for improvement on many tasks which remain underexplored. To lower the barrier to entry and facilitate further developments in the field, we also provide a comprehensive suite of tools for dataset processing, model training, and evaluation in our open-source atom3d Python package. All datasets are available for download from https://www.atom3d.ai .
翻译:在三维分子结构上运作的计算方法有可能解决生物学和化学方面的重要问题。特别是,深神经网络已获得极大关注,但它们在生物分子领域的广泛采用却因缺乏系统性性能基准或与分子数据互动的统一工具包而受到限制。为了解决这个问题,我们向大家介绍一套包含几个关键类别生物分子结构的新颖和现有基准数据集,即ATOOM3D,这是一套涵盖几个关键类别生物分子结构的新颖和现有基准数据集的汇编。我们为其中每一项任务采用若干类三维分子学习方法,并表明它们不断改进与一维和二维表现方法相比的性能。具体选择建筑结构已证明对业绩至关重要,三维革命网络在涉及复杂的地理特征、图形网络运行良好、系统需要详细定位信息、以及最近开发的等离子变量网络的任务方面表现良好。我们的结果显示,许多分子问题可以从三维分子学习中获益,而且许多任务仍然没有得到探讨,而且有可能改进。在进入和二维的表层图上,为进入和进一步升级的数据集的升级提供了实地中的所有数据集。我们还提供一套全面的数据集。