Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds.
翻译:通过时间变化组合环境中的在线决策,我们研究了将离线算法转换为在线对应方的问题。我们侧重于使用对本地错误具有强力的贪婪算法而能够持续要素接近的离线组合问题。对于这些问题,我们提供了一个总体框架,有效地将离线稳健的贪婪算法转换为使用Blackwell可接近性的在线算法。我们展示了由此产生的在线算法在完整信息设置下有1美元(近似)的遗憾。我们进一步引入了黑市可接近性的强盗扩展,我们称之为Bandit Blackwell可接近性。我们利用这个概念将贪婪强势离线算法转换为$O(T ⁇ 2/3})(近似)的经常要素。我们展示了我们框架的灵活性,我们将我们的离线对线转换应用于收入管理、市场设计和在线优化的交叉问题,包括在线平台的产品排序优化,在拍卖中保留价格优化,以及次模式最大化。我们展示了我们已知的转变,在应用这些应用到这些应用时,最终导致了我们已知的转变。