Efficiently discovering molecules that meet various property requirements can significantly benefit the drug discovery industry. Since it is infeasible to search over the entire chemical space, recent works adopt generative models for goal-directed molecular generation. They tend to utilize the iterative processes, optimizing the parameters of the molecular generative models at each iteration to produce promising molecules for further validation. Assessments are exploited to evaluate the generated molecules at each iteration, providing direction for model optimization. However, most previous works require a massive number of expensive and time-consuming assessments, e.g., wet experiments and molecular dynamic simulations, leading to the lack of practicability. To reduce the assessments in the iterative process, we propose a cost-effective evolution strategy in latent space, which optimizes the molecular latent representation vectors instead. We adopt a pre-trained molecular generative model to map the latent and observation spaces, taking advantage of the large-scale unlabeled molecules to learn chemical knowledge. To further reduce the number of expensive assessments, we introduce a pre-screener as the proxy to the assessments. We conduct extensive experiments on multiple optimization tasks comparing the proposed framework to several advanced techniques, showing that the proposed framework achieves better performance with fewer assessments.
翻译:高效发现符合各种财产要求的分子可以大大有益于药物发现行业。由于无法在整个化学空间搜索,最近的一些工作采用了目标导向分子生成的基因模型,倾向于利用迭代过程,优化每个迭代的分子基因模型参数,以产生有希望的分子;利用评估来评价每次迭代产生的分子,为模型优化提供方向。然而,大多数以前的工作需要大量昂贵和耗时的评估,例如湿实验和分子动态模拟,导致缺乏实用性。为减少迭接过程中的评估,我们提议在潜伏空间采取成本效益高的演化战略,优化分子潜在代表矢量,以产生有前途的分子模型,以图示潜在和观察空间,利用大型无标签的分子来学习化学知识。为进一步减少昂贵的评估次数,我们引入了数个前筛选器,作为评估的替代。为了减少迭接过程中的评估,我们提出了在潜在空间进行成本效益高的演进战略,以优化分子潜在代表量。我们采用了预先培训过的分子基因变异模型,以比较拟议的框架,以更精确的进度框架,我们用比较了多项评估。我们用比较了多项改进的进度框架。