Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph, that is closely related to the informative pattern within molecules. Interestingly, our proposed merge-and-update subgraph extraction method can automatically discover frequent principal subgraphs from the dataset, while previous methods are incapable of. Moreover, we develop a two-step subgraph assembling strategy, which first predicts a set of subgraphs in a sequence-wise manner and then assembles all generated subgraphs globally as the final output molecule. Built upon graph variational auto-encoder, our model is demonstrated to be effective in terms of several evaluation metrics and efficiency, compared with state-of-the-art methods on distribution learning and (constrained) property optimization tasks.
翻译:分子生成是各种应用的核心。 目前注意的是将生成任务作为子图预测和组装。 然而, 这些方法通常依赖于手工制作或外部子图构造, 而子图集合完全取决于本地安排 。 在本文中, 我们定义了一个与分子中的信息模式密切相关的新概念, 即主要子图谱。 有趣的是, 我们提议的合并和更新子图提取方法可以自动发现数据集中经常出现的主要子图, 而以前的方法是无法做到的 。 此外, 我们开发了两步子图组组合战略, 首先预测了一组子图, 然后将所有生成的子图组集合成最终输出分子 。 在图形变异自动编码器中, 我们的模型在几个评价指标和效率方面证明有效, 与最先进的分配学习方法和( 受限制的) 属性优化任务相比 。