Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery. Both ligand and target molecules are represented as graphs with node and edge features encoding information about atomic elements and bonds respectively. Although existing deep learning models perform remarkably well at predicting physicochemical properties and binding affinities, the generation of new molecules with optimized properties remains challenging. Inherently, most GNNs perform poorly in whole-graph representation due to the limitations of the message-passing paradigm. Furthermore, step-by-step graph generation frameworks that use reinforcement learning or other sequential processing can be slow and result in a high proportion of invalid molecules with substantial post-processing needed in order to satisfy the principles of stoichiometry. To address these issues, we propose a representation-first approach to molecular graph generation. We guide the latent representation of an autoencoder by capturing graph structure information with the geometric scattering transform and apply penalties that structure the representation also by molecular properties. We show that this highly structured latent space can be directly used for molecular graph generation by the use of a GAN. We demonstrate that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.
翻译:虽然现有的深层学习模型在预测物理化学特性和结合性方面表现得非常出色,但生成具有优化特性的新分子仍具有挑战性。由于信息传递模式的局限性,大多数GNNN在整幅图中的表现不力。此外,使用强化学习或其他相继处理的逐步图生成框架可能会缓慢,并导致大量无效分子,而后处理则需要大量的后处理才能满足科学测量原理。为了解决这些问题,我们提议了一种代表-第一种方法来生成分子图解。我们通过采集图形结构信息来引导一个具有最佳特性的自动图解结构信息,通过几何分布变来控制图解结构信息,并采用同样根据分子特性构建的公式。我们表明,这种结构化的高度潜在空间可以通过使用GAN平台直接用于分子图形的生成。我们展示了一种具有实际意义的药物合成模型。我们展示了一种能够学习有意义的药物合成模型。