Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in practical settings. In this work, we introduce a novel framework that is capable of generating new chemical structures while ensuring synthetic accessibility. Specifically, we introduce a postfix notation of synthetic pathways to represent molecules in chemical space. Then, we design a transformer-based model to translate molecular graphs into postfix notations of synthesis. We highlight the model's ability to: (a) perform bottom-up synthesis planning more accurately, (b) generate structurally similar, synthesizable analogs for unsynthesizable molecules proposed by generative models with their properties preserved, and (c) explore the local synthesizable chemical space around hit molecules.
翻译:新药物分子的发现是一个关键但极具挑战性的过程,这源于化学空间近乎无限广阔,且对时间和资源的要求极为苛刻。近年来,众多生成模型被提出以加速药物发现进程,但其向实验验证的推进仍然有限,主要原因在于实际场景中未充分考虑合成可行性。本文中,我们提出一种新颖框架,该框架能够在生成新化学结构的同时确保其可合成性。具体而言,我们引入合成路径的后缀表示法来表征化学空间中的分子。随后,我们设计了一种基于Transformer的模型,用于将分子图转化为合成路径的后缀表示。我们重点展示了该模型具备以下能力:(a) 更准确地进行自底向上的合成路线规划;(b) 为生成模型提出的不可合成分子生成结构相似、性质保留且可合成的类似物;(c) 探索先导分子周围局部的可合成化学空间。