In this paper, we analyze the continuous armed bandit problems for nonconvex cost functions under certain smoothness and sublevel set assumptions. We first derive an upper bound on the expected cumulative regret of a simple bin splitting method. We then propose an adaptive bin splitting method, which can significantly improve the performance. Furthermore, a minimax lower bound is derived, which shows that our new adaptive method achieves locally minimax optimal expected cumulative regret.
翻译:在本文中,我们根据某些平和和子级设定假设分析非混凝土成本功能的持续武装匪徒问题。 我们首先根据一个简单拆箱法的预期累积遗憾得出一个上限。 然后我们提出一个适应性拆箱法,这可以大大改善业绩。 此外,还得出了一个最小值下限,这表明我们新的适应方法实现了当地最理想的预期累积遗憾。