We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. By casting the learning process over a suitably defined space of strategy mixtures, we recover strong duality on a Lagrangian relaxation of the underlying optimization problem, even for general settings with non-convex reward and resource-consumption functions. Then, we provide the first best-of-many-worlds type framework for this setting, with no-regret guarantees under stochastic, adversarial, and non-stationary inputs. Our framework yields the same regret guarantees of prior work in the stochastic case. On the other hand, when budgets grow at least linearly in the time horizon, it allows us to provide a constant competitive ratio in the adversarial case, which improves over the best known upper bound bound of $O(\log m \log T)$. Moreover, our framework allows the decision maker to handle non-convex reward and cost functions. We provide two game-theoretic applications of our framework to give further evidence of its flexibility. In doing so, we show that it can be employed to implement budget-pacing mechanisms in repeated first-price auctions.
翻译:我们研究在线学习问题,即决策者希望在不违反一定的美元资源限制的情况下最大限度地获得预期的回报,不违反一定的美元资源限制。我们通过在战略混合物的合适确定空间上进行学习过程,在拉格朗格放松基本优化问题时恢复了强烈的双重性,即使是在一般情况下,即使具有非碳氢化合物奖励和资源消耗功能。然后,我们为这一环境提供了第一个最先进的多种世界类型的框架,在随机性、对抗性和非静态投入下提供无回报保证。我们的框架对先前的工作提供了同样的遗憾保证。另一方面,当预算在时间范围内至少线性增长时,它使我们能够在对抗性案件中提供持续的竞争性比率,这改善了已知的美元(log m\log T)的最佳上限。此外,我们的框架允许决策者处理非碳氢化合物奖励和成本功能。我们提供了我们框架的两种游戏理论应用,以进一步证明其灵活度。我们这样做时,可以反复使用预算机制来实施。</s>