Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the performance of new methods. Moreover, the sample efficiency of the optimization--the number of molecules evaluated by the oracle--is rarely discussed, despite being an essential consideration for realistic discovery applications. To fill this gap, we have created an open-source benchmark for practical molecular optimization, PMO, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This paper thoroughly investigates the performance of 25 molecular design algorithms on 23 tasks with a particular focus on sample efficiency. Our results show that most "state-of-the-art" methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting. We analyze the influence of the optimization algorithm choices, molecular assembly strategies, and oracle landscapes on the optimization performance to inform future algorithm development and benchmarking. PMO provides a standardized experimental setup to comprehensively evaluate and compare new molecule optimization methods with existing ones. All code can be found at https://github.com/wenhao-gao/mol_opt.
翻译:分子优化是化学科学的一个基本目标,是药物和材料设计的核心利益所在。近年来,在解决计算分子优化各个方面的棘手问题方面取得了重大进展,强调了高有效性、多样性和最近的合成性。尽管取得了这一进展,但许多文件都报告了关于琐碎或自我设计的任务的结果,为直接评估新方法的绩效带来了额外的挑战。此外,对由甲骨文评估的分子数量进行优化的效率很少讨论,尽管这是现实发现应用的一个必要考虑。为了填补这一空白,我们为实际分子优化建立了一个开放源基准,以方便对分子优化方面的算法进展进行透明和可复制的评价。尽管取得了这一进展,但许多文件还是报告了关于小小任务或自设计任务的结果,为直接评估新方法的绩效带来了额外的挑战。此外,我们的结果表明,大多数“状态-工艺”方法在有限或最接近预算的情况下无法超越其前身,而且现有的算法无法有效解决某些分子优化的分子优化问题。我们分析了分子优化/分子优化标准化标准化战略,我们分析了当前水平分析结果。我们找到了一种标准化的模型,将分析到现在的模型的精确化分析方法。