We consider a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We present a new algorithm that is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for CBwK when an algorithm must stop once some constraint is violated. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019) , a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.
翻译:我们考虑上下文感知多臂老虎机算法的变体,其中算法消耗多个资源,受到总消耗的线性约束。该问题推广了带背包的上下文感知多臂老虎机(CBwK),允许装箱和覆盖约束,以及正和负资源消耗。我们提出了一种新算法,它简单、计算效率高,并且具有逐渐减小的遗憾。当一个算法必须在某个约束条件被违反时停止时,它在CBwK上是统计上最优的。我们的算法建立在LagrangeBwK(Immorlica等人,FOCS 2019)和SquareCB(Foster和Rakhlin,ICML 2020)之上,这是一种基于Lagrangian的技术和一种基于回归的技术。我们的分析利用了两种技术的固有模块性。