Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.
翻译:在许多优化和决策问题中出现了对称性,并吸引了优化社区的相当重视:通过利用这种对称性的存在,寻找最佳解决方案的过程可以大大改进。尽管在(离线)优化方面取得成功,但在在线优化环境中,特别是在强盗文献中,对对对称性的利用没有很好地进行审查。因此,我们在本文件中研究了利普西茨匪帮中一个无差异的利普西茨匪帮小类,该小类的奖励功能和一套武器在一组变换下得以保存。我们引入了一个名为\ texttt{UniformMesh-N}的算法,该算法将使用群轨道的侧观测自然地纳入了(离线)优化中,特别是在强盗文献中。因此,我们研究了利普申茨匪匪帮的隐蔽环境。我们用侧观方法证明了一个更好的遗憾上限,这取决于集团的核心性,因为集团是有限的。我们还证明了一种将轮廓定等级理论与我们总体决策的下层级理论相匹配。