道路规划、网络运输和强化学习的新拍卖算法 (New Auction Algorithms for Path Planning, Network Transport, and Reinforcement Learning)

We consider some classical optimization problems in path planning and network transport, and we introduce new auction-based algorithms for their optimal and suboptimal solution. The algorithms are based on mathematical ideas that are related to competitive bidding by persons for objects and the attendant market equilibrium, which underlie auction processes. However, the starting point of our algorithms is different, namely weighted and unweighted path construction in directed graphs, rather than assignment of persons to objects. The new algorithms have several potential advantages over existing methods: they are empirically faster in some important contexts, such as max-flow, they are well-suited for on-line replanning, and they can be adapted to distributed asynchronous operation. Moreover, they allow arbitrary initial prices, without complementary slackness restrictions, and thus are better-suited to take advantage of reinforcement learning methods that use off-line training with data, as well as on-line training during real-time operation. The new algorithms may also find use in reinforcement learning contexts involving approximation, such as multistep lookahead and tree search schemes, and/or rollout algorithms.

翻译：我们考虑了道路规划和网络运输中的一些典型优化问题,我们为它们的最佳和亚最佳解决方案引入了新的拍卖算法。算法基于数学理念,这些理念与个人对物品的竞争性投标以及随之而来的市场平衡有关,而这正是拍卖过程的基础。然而,我们的算法的起点是不同的,即在定向图表中进行加权和无加权的路径构造,而不是将人员分派到物体上。新的算法与现有方法相比具有若干潜在优势:在诸如最大流程等一些重要背景下,新算法在经验上速度更快,适合在线重新规划,并且可以适应于无同步操作的分布。此外,这些算法允许任意的初始价格,而没有辅助性松动限制,因此更适合利用强化学习方法,即使用在线数据培训,以及在实时操作中进行在线培训。新的算法也可能在涉及近似性的强化学习环境中使用,例如多步式外观和树搜索计划,以及/或推出算法。