Generating molecules with desired biological activities has attracted growing attention in drug discovery. Previous molecular generation models are designed as chemocentric methods that hardly consider the drug-target interaction, limiting their practical applications. In this paper, we aim to generate molecular drugs in a target-aware manner that bridges biological activity and molecular design. To solve this problem, we compile a benchmark dataset from several publicly available datasets and build baselines in a unified framework. Building on the recent advantages of flow-based molecular generation models, we propose SiamFlow, which forces the flow to fit the distribution of target sequence embeddings in latent space. Specifically, we employ an alignment loss and a uniform loss to bring target sequence embeddings and drug graph embeddings into agreements while avoiding collapse. Furthermore, we formulate the alignment into a one-to-many problem by learning spaces of target sequence embeddings. Experiments quantitatively show that our proposed method learns meaningful representations in the latent space toward the target-aware molecular graph generation and provides an alternative approach to bridge biology and chemistry in drug discovery.
翻译:在药物发现方面,人们日益关注以理想生物活动生成分子的问题。以前的分子生成模型被设计为很少考虑药物目标相互作用、限制其实际应用的以化学为中心的方法。在本文中,我们的目标是以目标意识的方式产生分子药物,将生物活动和分子设计连接起来。为了解决这个问题,我们从几个公开可得的数据集中汇编一个基准数据集,并在一个统一的框架内建立基线。根据最近流基分子生成模型的优势,我们提议SiamFlow, 迫使流动以适应潜在空间嵌入的目标序列的分布。具体地说,我们使用调整损失和统一损失来将目标序列嵌入和药物图嵌入协议中,同时避免崩溃。此外,我们通过学习目标序列嵌入空间,将这种组合成一对一的问题。实验从数量上表明,我们提出的方法在潜在空间中学习了对目标觉分子图形生成的有意义的表现,并提供了在毒品发现中连接生物学和化学学的替代方法。