货物预订控制问题强化学习 (Reinforcement Learning for Freight Booking Control Problems)

Booking control problems are sequential decision-making problems that occur in the domain of revenue management. More precisely, freight booking control focuses on the problem of deciding to accept or reject bookings: given a limited capacity, accept a booking request or reject it to reserve capacity for future bookings with potentially higher revenue. This problem can be formulated as a finite-horizon stochastic dynamic program, where accepting a set of requests results in a profit at the end of the booking period that depends on the cost of fulfilling the accepted bookings. For many freight applications, the cost of fulfilling requests is obtained by solving an operational decision-making problem, which often requires the solutions to mixed-integer linear programs. Routinely solving such operational problems when deploying reinforcement learning algorithms may be too time consuming. The majority of booking control policies are obtained by solving problem-specific mathematical programming relaxations that are often non-trivial to generalize to new problems and, in some cases, provide quite crude approximations. In this work, we propose a two-phase approach: we first train a supervised learning model to predict the objective of the operational problem, and then we deploy the model within reinforcement learning algorithms to compute control policies. This approach is general: it can be used every time the objective function of the end-of-horizon operational problem can be predicted, and it is particularly suitable to those cases where such problems are computationally hard. Furthermore, it allows one to leverage the recent advances in reinforcement learning as routinely solving the operational problem is replaced with a single prediction. Our methodology is evaluated on two booking control problems in the literature, namely, distributional logistics and airline cargo management.

翻译：记录控制问题是在收入管理领域出现的顺序决策问题。更确切地说,货运预订控制侧重于决定接受或拒绝预订的问题:在能力有限的情况下,接受订票请求或拒绝它为未来订票能力储备,而收入可能较高。这个问题可以作为一个有限和偏差的动态程序来拟订,接受一套要求,在订票期结束时产生利润,这取决于完成所接受订票的费用。对于许多货运应用程序来说,满足要求的成本是通过解决一个业务决策问题获得的,这往往要求采用混合整数线性程序的解决办法。在部署强化学习算法时,例行解决这类业务问题的办法可能耗时过长。大部分订票政策是通过解决特定问题的数学编程松动程序获得的,而这种程序往往不切实际地用来概括新的问题,在某些情况下,提供相当粗略的近似的估算。在这项工作中,我们提出一个两阶段的解决方案:我们首先训练一个监督学习模型,用来预测最新固定的指数目标,也就是一个目标强化,然后我们用一个模型来学习总的运算法。