Stochastic Rising Bandits is a setting in which the values of the expected rewards of the available options increase every time they are selected. This framework models a wide range of scenarios in which the available options are learning entities whose performance improves over time. In this paper, we focus on the Best Arm Identification (BAI) problem for the stochastic rested rising bandits. In this scenario, we are asked, given a fixed budget of rounds, to provide a recommendation about the best option at the end of the selection process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. We show that they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process. Finally, we numerically validate the proposed algorithms in synthetic and realistic environments and compare them with the currently available BAI strategies.
翻译:连续不断上升的土匪的最佳武器识别(BAI)问题。在这种假设中,我们被请求,根据固定的回合预算,就选择过程结束时的最佳选择提出建议。我们建议两种算法,即采用类似于UCB-BE的方法的R-UCBE和采用连续拒绝程序的R-SR。我们表明,它们为在学习过程结束时正确确定最佳选择的可能性提供了保证。最后,我们用数字来验证合成和现实环境中的拟议算法,并将其与现有的BAI战略进行比较。