Motivated by robust dynamic resource allocation in operations research, we study the \textit{Online Learning to Transport} (OLT) problem where the decision variable is a probability measure, an infinite-dimensional object. We draw connections between online learning, optimal transport, and partial differential equations through an insight called the minimal selection principle, originally studied in the Wasserstein gradient flow setting by \citet{Ambrosio_2005}. This allows us to extend the standard online learning framework to the infinite-dimensional setting seamlessly. Based on our framework, we derive a novel method called the \textit{minimal selection or exploration (MSoE) algorithm} to solve OLT problems using mean-field approximation and discretization techniques. In the displacement convex setting, the main theoretical message underpinning our approach is that minimizing transport cost over time (via the minimal selection principle) ensures optimal cumulative regret upper bounds. On the algorithmic side, our MSoE algorithm applies beyond the displacement convex setting, making the mathematical theory of optimal transport practically relevant to non-convex settings common in dynamic resource allocation.
翻译:在业务研究中,我们以强健的动态资源分配为动力,研究决定变量是一个概率度量、无限天体的 OLT (OLT) 问题。我们通过名为最低选择原则的洞察力,在在线学习、最佳交通和部分差异方程之间建立联系,该洞察力最初在Wasserstein 梯度流设置中通过\citet{Ambrosio_2005}进行了研究。这使我们能够将标准的在线学习框架扩展到无限的无缝设置。根据我们的框架,我们得出了一种叫作textit{最小选择或探索(MSoE)算法的新型方法,用平均场近似和离散技术解决OLT问题。在置换组合设置中,我们方法的主要理论信息是(通过最小选择原理)最大限度地减少时间的运输成本,确保最佳累积的遗憾上限。在算法方面,我们的MSoE算法应用超越了移位矩设置,使最佳运输的数学理论与动态资源分配中常见的非convex 环境切实相关。