Drug Discovery is a fundamental and ever-evolving field of research. The design of new candidate molecules requires large amounts of time and money, and computational methods are being increasingly employed to cut these costs. Machine learning methods are ideal for the design of large amounts of potential new candidate molecules, which are naturally represented as graphs. Graph generation is being revolutionized by deep learning methods, and molecular generation is one of its most promising applications. In this paper, we introduce a sequential molecular graph generator based on a set of graph neural network modules, which we call MG^2N^2. At each step, a node or a group of nodes is added to the graph, along with its connections. The modular architecture simplifies the training procedure, also allowing an independent retraining of a single module. Sequentiality and modularity make the generation process interpretable. The use of graph neural networks maximizes the information in input at each generative step, which consists of the subgraph produced during the previous steps. Experiments of unconditional generation on the QM9 and Zinc datasets show that our model is capable of generalizing molecular patterns seen during the training phase, without overfitting. The results indicate that our method is competitive, and outperforms challenging baselines for unconditional generation.
翻译:药物发现是一个根本性且不断演变的研究领域。 设计新的候选分子需要大量的时间和金钱, 并且正在越来越多地使用计算方法来削减这些成本。 机器学习方法对于设计大量潜在候选分子是理想的, 这些潜在候选分子自然以图示表示。 图表生成正在通过深层次的学习方法进行革命, 分子生成是其最有希望的应用之一 。 在本文中, 我们根据一组图形神经网络模块引入一个序列分子图形生成器, 我们称之为MG2N2。 每一步, 都会增加一个节点或一组节点及其连接。 模块结构简化了培训程序, 也允许对单个模块进行独立的再培训。 序列性和模块化使生成过程可以被解释。 图形神经网络的利用使每个精度步骤( 包括先前步骤的子图) 输入的信息最大化。 在QM9 和 Zinc 数据集中, 无条件生成的实验显示, 我们的模型能够超越常规分子模型的阶段, 显示我们不具有竞争力的模型的模型的模型模型将超越了整个分子生成阶段。