基于深强化学习基于深强化学习的公共汽车动态优化时间表 (Deep Reinforcement Learning based Dynamic Optimization of Bus Timetable)

Bus timetable optimization is a key issue to reduce operational cost of bus companies and improve the service quality. Existing methods use exact or heuristic algorithms to optimize the timetable in an offline manner. In practice, the passenger flow may change significantly over time. Timetables determined in offline cannot adjust the departure interval to satisfy the changed passenger flow. Aiming at improving the online performance of bus timetable, we propose a Deep Reinforcement Learning based bus Timetable dynamic Optimization method (DRL-TO). In this method, the timetable optimization is considered as a sequential decision problem. A Deep Q-Network (DQN) is employed as the decision model to determine whether to dispatch a bus service during each minute of the service period. Therefore, the departure intervals of bus services are determined in real time in accordance with passenger demand. We identify several new and useful state features for the DQN, including the load factor, carrying capacity utilization rate, and the number of stranding passengers. Taking into account both the interests of the bus company and passengers, a reward function is designed, which includes the indicators of full load rate, empty load rate, passengers' waiting time, and the number of stranding passengers. Building on an existing method for calculating the carrying capacity, we develop a new technique to enhance the matching degree at each bus station. Experiments demonstrate that compared with the timetable generated by the state-of-the-art bus timetable optimization approach based on a memetic algorithm (BTOA-MA), Genetic Algorithm (GA) and the manual method, DRL-TO can dynamically determine the departure intervals based on the real-time passenger flow, saving 8$\%$ of vehicles and reducing 17$\%$ of passengers' waiting time on average.

翻译：公共汽车时间表优化是降低公共汽车公司运营成本和改善服务质量的一个关键问题。现有方法使用精确或超速算法, 以离线方式优化时间表。实际上, 客流量可能会随着时间的变化而发生重大变化。离线确定的时间表无法调整离开间隔, 以满足改变的乘客流量。为了改进公共汽车时间表的在线性能, 我们提议基于公共汽车时间表的深度强化学习学习方法( DRL-TO ) 。在这个方法中, 时间表优化被视为一个顺序决定问题。深QNetwork (DQQN) 被作为决定模型, 以确定是否在服务期间的每分钟内发送公共汽车服务。因此, 离线确定的时间表无法根据乘客需求的变化调整出发间隔。我们为DQN 确定了一些新的有用状态, 包括载荷系数、载能力利用率和固定乘客数量。考虑到公共汽车公司和乘客的利益, 正在设计一种奖励功能, 其中包括全载重量指数、空负载运价比率、乘客的离程路程间隔时间、与我们计算动力速度的进度。