We consider the problem of optimal unsignalized intersection management for continual streams of randomly arriving robots. This problem involves solving many instances of a mixed integer program, for which the computation time using a naive optimization algorithm scales exponentially with the number of robots and lanes. Hence, such an approach is not suitable for real-time implementation. In this paper, we propose a solution framework that combines learning and sequential optimization. In particular, we propose an algorithm for learning a policy that given the traffic state information, determines the crossing order of the robots. Then, we optimize the trajectories of the robots sequentially according to that crossing order. The proposed algorithm learns a shared policy that can be deployed in a distributed manner. We validate the performance of this approach using extensive simulations. Our approach, on average, significantly outperforms the heuristics from the literature and gives near-optimal solutions. We also show through simulations that the computation time for our approach scales linearly with the number of robots.
翻译:我们考虑对随机抵达的机器人的连续流进行最佳无信号交叉管理的问题。 这个问题涉及解决混合整数程序的许多实例, 即使用天性优化算法的计算时间与机器人和车道的数量成倍增长。 因此, 这样一种方法不适合实时实施。 在本文中, 我们提出一个将学习和顺序优化相结合的解决方案框架。 特别是, 我们提出一个算法, 用于学习一种政策, 以交通状态信息为条件, 决定机器人的交叉顺序 。 然后, 我们根据该交叉顺序优化机器人的轨迹 。 提议的算法学习一种可以分布方式部署的共同政策 。 我们使用广泛的模拟来验证这一方法的绩效。 我们的方法通常大大超过文献中的超自然现象, 并给出近于最佳的解决方案 。 我们还通过模拟来显示我们方法的计算时间与机器人的数量线性计算时间 。