We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic quantities that measure the similarity between experts. In some natural special cases, this allows us to obtain the first regret bound for EXP4 that can get arbitrarily close to zero if the experts are similar enough. While for a different algorithm, we provide another bound that describes the similarity between the experts in terms of the KL-divergence, and we show that this bound can be smaller than the one of EXP4 in some cases. Additionally, we provide lower bounds for certain classes of experts showing that the algorithms we analyzed are nearly optimal in some cases.
翻译:我们通过专家咨询调查强盗问题,当专家是固定的和已知的行动分布。根据以往的分析,我们发现,在这一背景下,对强盗的遗憾是由测量专家相似性的信息理论数量控制的。在某些自然的特殊情况下,这使我们能够获得对EXP4的第一遗憾,如果专家足够相似,这种道歉可以任意接近于零。对于不同的算法,我们提供了另一个约束,描述了专家在KL-调控方面的相似性,并且我们表明,在某些情况下,这种约束可能小于EXP4。此外,我们为某些类别的专家提供了较低的界限,表明我们分析的算法在某些情况下几乎是最佳的。</s>