Search algorithms for the bandit problems are applicable in materials discovery. However, the objectives of the conventional bandit problem are different from those of materials discovery. The conventional bandit problem aims to maximize the total rewards, whereas materials discovery aims to achieve breakthroughs in material properties. The max K-armed bandit (MKB) problem, which aims to acquire the single best reward, matches with the discovery tasks better than the conventional bandit. Thus, here, we propose a search algorithm for materials discovery based on the MKB problem using a pseudo-value of the upper confidence bound of expected improvement of the best reward. This approach is pseudo-guaranteed to be asymptotic oracles that do not depends on the time horizon. In addition, compared with other MKB algorithms, the proposed algorithm has only one hyperparameter, which is advantageous in materials discovery. We applied the proposed algorithm to synthetic problems and molecular-design demonstrations using a Monte Carlo tree search. According to the results, the proposed algorithm stably outperformed other bandit algorithms in the late stage of the search process when the optimal arm of the MKB could not be determined based on its expectation reward.
翻译:对土匪问题的搜索算法适用于物料发现。然而,传统土匪问题的目的与物料发现的目的不同。传统土匪问题的目的是最大限度地提高总报酬,而材料发现的目的是在物质属性方面实现突破。最大K型武装土匪问题(MKB)旨在获得单一最佳报酬,与发现任务比传统土匪更匹配。因此,我们在此建议一种基于MKB问题的材料发现搜索算法,使用预期最佳报酬改进的高度信任圈的伪值。这种方法是伪担保的,不取决于时间范围。此外,与其他MKB算法相比,拟议的算法只有一个超参数,在材料发现方面是有利的。我们用拟议算法处理合成问题和使用蒙特卡洛树搜索的分子设计演示。根据结果,拟议的算法在搜索过程的后期,在MKB最佳手臂的预期值不能确定时,在搜索过程的后期将其他土匪算算法。