通过离线贪婪定值法在线学习:市场设计和优化的应用 (Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization)

Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We also extend our reduction to greedy-like first order methods used in continuous optimization, such as those used for maximizing continuous strong DR monotone submodular functions subject to convex constraints. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. We complement our theoretical studies by conducting numerical simulations for two of our applications, in both of which we observe that the numerical performance of our transformations outperforms the theoretical guarantees in practical instances.

翻译：通过时间变化组合环境中的在线决策,我们研究了将离线算法转换为在线对应方的问题。我们关注离线组合问题,这些问题使用贪婪算法对本地错误具有很强的力度。对于这些问题,我们提供了一个总体框架,有效地将离线稳健的贪婪算法转换为使用黑市可接近性的在线算法。我们展示了由此产生的在线算法在完整信息设置下有1美元(近乎)的遗憾。我们进一步引入了黑市可接近性的强盗扩展,我们称之为黑市可接近性。我们利用这个概念将贪婪强势离线算法转换成一个不变的系数近似于本地差错。我们展示了我们框架的灵活性,我们运用了我们的离线对线转换法,在收入管理、市场设计和在线优化的交叉点上,包括在线平台上的产品排序优化,在拍卖中保留价格优化,以及子模块最大化。我们还利用这个概念来将贪婪强势的离线算法转换功能推广到当前最大幅度的软化的当前应用中,我们使用了这些软化的软化的软质变压方法。