在线线性编程:双重趋同、新算法和悔恨环 (Online Linear Programming: Dual Convergence, New Algorithms, and Regret Bounds)

We study an online linear programming (OLP) problem under a random input model in which the columns of the constraint matrix along with the corresponding coefficients in the objective function are generated i.i.d. from an unknown distribution and revealed sequentially over time. Virtually all pre-existing online algorithms were based on learning the dual optimal solutions/prices of the linear programs (LP), and their analyses were focused on the aggregate objective value and solving the packing LP where all coefficients in the constraint matrix and objective are nonnegative. However, two major open questions were: (i) Does the set of LP optimal dual prices learned in the pre-existing algorithms converge to those of the "offline" LP, and (ii) Could the results be extended to general LP problems where the coefficients can be either positive or negative. We resolve these two questions by establishing convergence results for the dual prices under moderate regularity conditions for general LP problems. Specifically, we identify an equivalent form of the dual problem which relates the dual LP with a sample average approximation to a stochastic program. Furthermore, we propose a new type of OLP algorithm, Action-History-Dependent Learning Algorithm, which improves the previous algorithm performances by taking into account the past input data as well as decisions/actions already made. We derive an $O(\log n \log \log n)$ regret bound (under a locally strong convexity and smoothness condition) for the proposed algorithm, against the $O(\sqrt{n})$ bound for typical dual-price learning algorithms, where $n$ is the number of decision variables. Numerical experiments demonstrate the effectiveness of the proposed algorithm and the action-history-dependent design.

翻译：我们在一个随机输入模式下研究在线线性程序(OLP)问题,在随机输入模式下,限制矩阵的列以及目标函数中的相应系数由未知的分布和连续披露产生。几乎所有先前存在的在线算法都基于学习线性程序(LP)的双重最佳解决方案/价格,它们的分析侧重于总目标值和包装LP,其中制约矩阵和目标中的所有系数都是非负的。然而,两个主要的未决问题是:(一) 在原存在的算法中学习的LP最佳双重价格集成于“offline” LP,和(二) 结果可能扩展至一般LP问题,其中的系数可以是正或负的。我们通过在一般LP问题的适度常规条件下确定双重价格的趋同结果。具体地说,我们找出了双重问题的相同形式,它与二元的样本平均近似近似于一个随机程序。此外,我们提议将新的LPO-Ral 运算的内值归值归为新类型,我们用NOL- AL- dalalalalalalalal 来显示前的内行的内值。