配有团队比较的强盗 (Dueling Bandits with Team Comparisons)

We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least $1/2$). Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al.2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team within $\mathcal{O}((n + k \log (k)) \frac{\max(\log\log n, \log k)}{\Delta^2})$ duels, where $\Delta$ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within $\mathcal{O}(nk\log(k)+k^5)$ duels.

翻译：我们引入了决斗团队问题, 这是一种新的在线学习环境, 学习者在其中观察到来自美元球员的球员世界范围内, 以美元大小的球员对不连配的一对美元大小的球队进行杂交比较。学习者的目标是, 以概率高的方式, 最大限度地减少确定一个康多塞特赢球队所需的决斗数量, 也就是说, 球队胜过任何其他不和球队( 概率至少为1/2美元 ) 。吵闹比较与球队的总顺序挂钩。我们通过决斗匪队的设置( Yue et al. 2012) 正式确定我们的模型, 并提供数种算法, 两者都是用于随机和确定性设置的。对于整局设置, 我们为经典决斗匪队的设置提供了减少的决斗斗, 产生一个算法, 在 $\ mathcall{O} (n k + klog ( k)\ k)\ gromaxn 中确定一个决斗队的决斗队。