多Ared 土匪区域搜索最佳最优化算法 (Optimal Algorithms for Range Searching over Multi-Armed Bandits)

This paper studies a multi-armed bandit (MAB) version of the range-searching problem. In its basic form, range searching considers as input a set of points (on the real line) and a collection of (real) intervals. Here, with each specified point, we have an associated weight, and the problem objective is to find a maximum-weight point within every given interval. The current work addresses range searching with stochastic weights: each point corresponds to an arm (that admits sample access) and the point's weight is the (unknown) mean of the underlying distribution. In this MAB setup, we develop sample-efficient algorithms that find, with high probability, near-optimal arms within the given intervals, i.e., we obtain PAC (probably approximately correct) guarantees. We also provide an algorithm for a generalization wherein the weight of each point is a multi-dimensional vector. The sample complexities of our algorithms depend, in particular, on the size of the optimal hitting set of the given intervals. Finally, we establish lower bounds proving that the obtained sample complexities are essentially tight. Our results highlight the significance of geometric constructs -- specifically, hitting sets -- in our MAB setting.

翻译：本文研究一个多臂强盗( MAB) 版本的测距搜索问题。以其基本形式, 范围搜索将一组点( 真实线上) 和一系列( 真实) 间隔作为输入输入。这里, 每指定一点, 我们都有相关的重量, 问题的目标是在每个特定的间隔内找到一个最大重量点。目前的工作用随机重量来研究范围: 每个点对应一个手臂( 允许抽样访问), 点的重量是底部分布的( 未知的) 平均值。在这个测距设置中, 我们开发了一个样本效率低的算法, 在给定的间隔内发现一组极有可能接近最佳的手臂, 也就是说, 我们得到了PAC( 可能大致正确) 的保证。我们还为总化提供了一种算法, 其中每个点的重量是多维矢量。我们算法的抽样复杂度取决于特定间隔内最佳打击量的大小。最后, 我们设定了更低的界限, 证明获得的取样复杂性基本上很紧。我们的测算结果显示我们的测距的意义。