This paper investigates the problem of assigning shipping requests to ad hoc couriers in the context of crowdsourced urban delivery. The shipping requests are spatially distributed each with a limited time window between the earliest time for pickup and latest time for delivery. The ad hoc couriers, termed crowdsourcees, also have limited time availability and carrying capacity. We propose a new deep reinforcement learning (DRL)-based approach to tackling this assignment problem. A deep Q network (DQN) algorithm is trained which entails two salient features of experience replay and target network that enhance the efficiency, convergence, and stability of DRL training. More importantly, this paper makes three methodological contributions: 1) presenting a comprehensive and novel characterization of crowdshipping system states that encompasses spatial-temporal and capacity information of crowdsourcees and requests; 2) embedding heuristics that leverage the information offered by the state representation and are based on intuitive reasoning to guide specific actions to take, to preserve tractability and enhance efficiency of training; and 3) integrating rule-interposing to prevent repeated visiting of the same routes and node sequences during routing improvement, thereby further enhancing the training efficiency by accelerating learning. The effectiveness of the proposed approach is demonstrated through extensive numerical analysis. The results show the benefits brought by the heuristics-guided action choice and rule-interposing in DRL training, and the superiority of the proposed approach over existing heuristics in both solution quality, time, and scalability. Besides the potential to improve the efficiency of crowdshipping operation planning, the proposed approach also provides a new avenue and generic framework for other problems in the vehicle routing context.
翻译:本文探讨了在多方联动城市交付的背景下将航运请求分配给临时信使的问题; 航运请求在空间上分布,从最早的接货时间到最近交付时间之间都有有限的时间窗口; 临时信使,称为多方联动,也具有有限的可用时间和承载能力; 我们建议采用基于深度强化学习(DRL)的新方法来解决这一派任问题; 深层次的Q网络算法经过培训,具有两个突出的经验重播和目标网络特征,从而提高DRL培训的效率、趋同和稳定性; 更重要的是,本文在方法上做出了三项贡献:(1) 展示了集聚系统状态系统的全面和新颖的特征,包括了多方联动源和请求的空间-时间和能力信息;(2) 嵌入了一种利用国家代表提供的信息并以直观推论为基础来指导具体行动,以保持易行性和提高培训效率; 3 整合了规则间联动框架,以防止重复访问同一路线和在路程改进过程中的顺序; 进一步展示了规则间联动性做法,从而进一步提高了规则性,从而进一步展示了规则间联动方法的操作效率; 进一步展示了在他路交式解决方案中的效率; 展示了提高行动效率,从而进一步展示了拟议的培训方法; 展示了提高了拟议中的效率; 展示了在提高行动效率; 展示了在提高了拟议中的效率;