Forming a high-quality molecular candidate set that contains a wide range of dissimilar compounds is crucial to the success of drug discovery. However, comparing to the research aiming at optimizing chemical properties, how to measure and improve the variety of drug candidates is relatively understudied. In this paper, we first investigate the problem of properly measuring the molecular variety through both an axiomatic analysis framework and an empirical study. Our analysis suggests that many existing measures are not suitable for evaluating the variety of molecules. We also propose new variety measures based on our analysis. We further explicitly integrate the proposed variety measures into the optimization objective of molecular generation models. Our experiment results demonstrate that this new optimization objective can guide molecular generation models to find compounds that cover a lager chemical space, providing the downstream phases with more distinctive drug candidate choices.
翻译:形成一个包含多种不同化合物的高质量分子候选组对于药物发现的成功至关重要。然而,与旨在优化化学特性的研究相比,如何测量和改进药物候选体的多样性研究相对研究不足。在本文件中,我们首先通过一个不言而喻的分析框架和一项经验性研究来调查适当测量分子多样性的问题。我们的分析表明,许多现有措施不适合评估分子的种类。我们还根据我们的分析提出了新的多样化措施。我们进一步明确地将拟议的多样化措施纳入分子生成模型的最优化目标。我们的实验结果表明,这一新的最优化目标可以指导分子生成模型找到包含较慢化学空间的化合物,为下游阶段提供更独特的药物候选选择。