Recent developments in artificial intelligence and automation could potentially enable a new drug design paradigm: autonomous drug design. Under this paradigm, generative models provide suggestions on thousands of molecules with specific properties. However, since only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select these. We formulate this task as a contextual stochastic multi-armed bandit problem with multiple plays and volatile arms. Then, to solve it, we extend previous work on multi-armed bandits to reflect this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection. To investigate how the different selection strategies affect the cumulative reward and the diversity of the selections, we simulate the drug design process. According to the simulation results, our approach has the potential for better exploring and exploiting the chemical space for autonomous drug design.
翻译:最近人工智能和自动化的发展有可能促成一种新的药物设计范式:自主药物设计。在这个范式下,基因模型为具有特定特性的数千个分子提供了建议。然而,由于只有数量有限的分子可以合成和测试,一个明显的挑战是如何有效地选择这些分子。我们把这个任务设计成一个具有多重动作和挥发性武器的背景型多武装强盗问题。然后,为了解决这个问题,我们扩大了以前关于多武装强盗的工作,以反映这一背景,并将我们的解决办法与随机抽样、贪婪的选择和腐烂的百分层选择进行比较。为了调查不同的选择战略如何影响累积的奖励和选择的多样性,我们模拟药物设计过程。根据模拟结果,我们的方法有可能更好地探索和利用化学空间进行自主药物设计。