具有强化学习的分子设计 (Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning)

Machine learning has the potential to automate molecular design and drastically accelerate the discovery of new functional compounds. Towards this goal, generative models and reinforcement learning (RL) using string and graph representations have been successfully used to search for novel molecules. However, these approaches are limited since their representations ignore the three-dimensional (3D) structure of molecules. In fact, geometry plays an important role in many applications in inverse molecular design, especially in drug discovery. Thus, it is important to build models that can generate molecular structures in 3D space based on property-oriented geometric constraints. To address this, one approach is to generate molecules as 3D point clouds by sequentially placing atoms at locations in space -- this allows the process to be guided by physical quantities such as energy or other properties. However, this approach is inefficient as placing individual atoms makes the exploration unnecessarily deep, limiting the complexity of molecules that can be generated. Moreover, when optimizing a molecule, organic and medicinal chemists use known fragments and functional groups, not single atoms. We introduce a novel RL framework for scalable 3D design that uses a hierarchical agent to build molecules by placing molecular substructures sequentially in 3D space, thus attempting to build on the existing human knowledge in the field of molecular design. In a variety of experiments with different substructures, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms from many distributions including drug-like molecules, organic LED molecules, and biomolecules.

翻译：机器学习有可能使分子设计自动化,并大大加速发现新的功能化合物。为此,利用弦和图示表示方式成功地利用基因模型和强化学习(RL)来搜索新分子。然而,这些方法是有限的,因为它们的表示方式忽略了分子的三维(3D)结构。事实上,几何在许多应用中,特别是在药物发现中,分子设计反向,特别是在分子设计中起着重要作用。因此,重要的是要建立模型,在3D空间中产生分子结构,这种模型可以产生基于属性导向分子的分子特性的分子结构。为了解决这个问题,一种方法是通过在空间地点按顺序放置原子原子来生成3D点云,从而让过程以物理数量作为指导,例如能源或其他特性。然而,这种方法效率低下,因为将个体原子置于不必要地深层,限制了可生成的分子的复杂性。此外,当优化分子、有机和药用化学化学家只能使用已知的碎片和功能组,而不是单一的原子组。我们引入了一个新型的RL框架,用于将分子的3D设计设计作为3D设计过程的缩缩框架,从而将生物化学分子结构进行生物分子结构的分子结构的分子结构的研修整。