In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a $T$-step problem with Lipschitz reward of zooming dimension $d_z$, our algorithm achieves theoretically optimal regret rate of $ \widetilde{\mathcal{O}} \left( T^{\frac{d_z + 1}{d_z + 2}} \right) $ using only $ \mathcal{O} \left( \log\log T\right) $ batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that $\widetilde{\Omega}(\log\log T)$ batches are necessary for any algorithm to achieve the optimal regret. Thus, up to logarithmic factors, BLiN achieves optimal regret rate using minimal communication.
翻译:在本文中,我们用批量反馈研究Lipschitz盗匪问题,预期的奖赏是Lipschitz,奖励意见分批传达给玩家。我们引入了一种叫Batched Lipschitz NArowing (BLiN)的新的地貌觉悟算法,它最理想地解决这个问题。具体地说,我们证明,对于Lipschitz对放大维度奖励的T$-级问题,我们的算法在理论上达到美元全方位创价的最佳遗憾率($ $ lefttilde ~pathcal {O ⁇ \left (Táfrac{d_z+1 ⁇ d_z+2 ⁇ z\\right) 。因此,根据对逻辑因素,BLiN只使用$mathcal{O}\left (\log\log t\right) $ 批量。我们还提供了这一问题的复杂分析。我们的理论下限意味着, 任何算法都有必要达到$ $ $ leveltelde {Ometaga) 批数, efortime to to to the to regrestimeal to to to fortical to to to to to to orticle make to to ortical to orticalticaltime le to mess.