与强盗反馈最佳群集 (Optimal Clustering with Bandit Feedback)

This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant $\delta$. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.

翻译：本文考虑的是使用土匪反馈进行在线集群的问题。一组武器( 或物品) 可以分成不同的未知组别。每个组别中, 与每股相关的观测都遵循相同的平均矢量分布。每次步骤中, 代理人询问或拉扯一个手臂, 并从与其相关的分布中获得独立的观测。事后拉动取决于先前的和先前获得的样本。代理人的任务是用最少的手臂拉动和不超过规定的定值$delta$的误差概率来发现武器的潜在分割。所提出的问题从病毒的变异组合到在线市场分割中发现许多应用。我们展示了对这项任务预期的样本复杂性依赖实例的信息理论性较低, 并设计了一种计算高效和无干扰的最佳算法, 即 Bandit 在线集群( BOC ) 。算法包括一种新规则, 用于适应性序列测试, 避免了精确解决任何硬性 NP 加权组合问题的需要, 作为其子路由。我们通过广泛模拟, 将合成和现实数据模拟, 显示一种不甚严格的模拟, 将合成和真实的模拟, 和真实的模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟,, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟,