Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation. We report an amortized approach to generate synthetic pathways as a Markov decision process conditioned on a target molecular embedding. This approach allows us to conduct synthesis planning in a bottom-up manner and design synthesizable molecules by decoding from optimized conditional codes, demonstrating the potential to solve both problems of design and synthesis simultaneously. The approach leverages neural networks to probabilistically model the synthetic trees, one reaction step at a time, according to reactivity rules encoded in a discrete action space of reaction templates. We train these networks on hundreds of thousands of artificial pathways generated from a pool of purchasable compounds and a list of expert-curated templates. We validate our method with (a) the recovery of molecules using conditional generation, (b) the identification of synthesizable structural analogs, and (c) the optimization of molecular structures given oracle functions relevant to drug discovery.
翻译:分子设计和合成规划是分子发现过程中的两个关键步骤,我们建议作为有条件合成途径生成的单一共同任务来制定分子发现过程。我们报告采用摊还法来生成合成路径,作为Markov决定过程,以目标分子嵌入为条件。这种方法使我们能够以自下而上的方式进行合成规划,设计可合成的分子,从优化的有条件代码中解码,表明同时解决设计和合成问题的潜力。该方法利用神经网络,根据反应模板分立行动空间的再活动规则,一次对合成树进行概率性模拟,一个反应步骤。我们对这些网络进行关于从可净化化合物库中产生的数十万条人工路径和专家精密模板清单的培训。我们用以下方法验证我们的方法:(a)利用有条件生成的有条件生成来回收分子,(b)确定可合成的结构模拟,以及(c)优化与毒品发现有关的分子结构。