Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general "reduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from \citet{AgrawalDevanur-ec14}, providing new analyses thereof.
翻译:使用 Knapsacks (BwK) 的强盗是多武装强盗在供应/预算限制下的一般模式。 虽然 BwK 最坏的遗憾界限已经非常清楚, 但我们展示了三种结果, 超越了最坏的视角。 首先, 我们提供上下界限, 相当于对数率的完整描述。 第二, 我们考虑 BwK 中的“ 简单遗憾 ”, 它跟踪了算法在特定回合中的表现, 并证明算法在几轮之外都很小。 第三, 我们提供从 BwK 到强盗的一般“ 削减 ”, 利用一些已知的有用结构, 并将这一削减应用于组合式半带、 线性背景强盗和多种族- logit 强盗。 我们的结果建立在\ citet{ AgrawalDevanur- ec14} 的 BwK 算法上, 提供了新的分析 。