Bandits with Knapsacks (BwK), the generalization of the Multi-Armed Bandits under budget constraints, has received a lot of attention in recent years. It has numerous applications, including dynamic pricing, repeated auctions, etc. Previous work has focused on one of the two extremes: Stochastic BwK where the rewards and consumptions of the resources each round are sampled from an i.i.d. distribution, and Adversarial BwK where these values are picked by an adversary. Achievable guarantees in the two cases exhibit a massive gap: No-regret learning is achievable in Stochastic BwK, but in Adversarial BwK, only competitive ratio style guarantees are achievable, where the competitive ratio depends on the budget. What makes this gap so vast is that in Adversarial BwK the guarantees get worse in the typical case when the budget is more binding. While ``best-of-both-worlds'' type algorithms are known (algorithms that provide the best achievable guarantee in both extreme cases), their guarantees degrade to the adversarial case as soon as the environment is not fully stochastic. Our work aims to bridge this gap, offering guarantees for a workload that is not exactly stochastic but is also not worst-case. We define a condition, Approximately Stationary BwK, that parameterizes how close to stochastic or adversarial an instance is. Based on these parameters, we explore what is the best competitive ratio attainable in BwK. We explore two algorithms that are oblivious to the values of the parameters but guarantee competitive ratios that smoothly transition between the best possible guarantees in the two extreme cases, depending on the values of the parameters. Our guarantees offer great improvement over the adversarial guarantee, especially when the available budget is small. We also prove bounds on the achievable guarantee, showing that our results are approximately tight when the budget is small.
翻译:具有Knapsacks (BwK) 的盗匪, 以及预算制约下多Armed Bamtits 的通用值。 近几年来, 多Armed Bamtits 的通用值得到了很大的关注。 它有许多应用, 包括动态定价、 多次拍卖等等。 以前的工作集中在两个极端之一: Stochastic BwK, 每轮资源的奖赏和消费都来自i. id. 分布, 以及 Aversarial BwK, 其中这些价值由对手取出。 两个案例中的可实现性参数都显示出巨大的改善差距: 不重复性学习在Stochatic BwK 中是可以实现的, 但是在Adversarial BwK 中, 最有竞争力的预算模式的保证也并非完全可以实现的。 当预算更具约束力时, 我们的双向世界的保证是可变式的。 我们的保证是可变的。 在两个极端案例中, 最短的保证是最接近的, 最坏的, 最坏的预算基点不是最坏的, 最坏的保证在Bribrial- 。</s>