In pursuit of a more sustainable and cost-efficient last mile, parcel lockers have gained a firm foothold in the parcel delivery landscape. To fully exploit their potential and simultaneously ensure customer satisfaction, successful management of the locker's limited capacity is crucial. This is challenging as future delivery requests and pickup times are stochastic from the provider's perspective. In response, we propose to dynamically control whether the locker is presented as an available delivery option to each incoming customer with the goal of maximizing the number of served requests weighted by their priority. Additionally, we take different compartment sizes into account, which entails a second type of decision as parcels scheduled for delivery must be allocated. We formalize the problem as an infinite-horizon sequential decision problem and find that exact methods are intractable due to the curses of dimensionality. In light of this, we develop a solution framework that orchestrates multiple algorithmic techniques rooted in Sequential Decision Analytics and Reinforcement Learning, namely cost function approximation and an offline trained parametric value function approximation together with a truncated online rollout. Our innovative approach to combine these techniques enables us to address the strong interrelations between the two decision types. As a general methodological contribution, we enhance the training of our value function approximation with a modified version of experience replay that enforces structure in the value function. Our computational study shows that our method outperforms a myopic benchmark by 13.7% and an industry-inspired policy by 12.6%.
翻译:暂无翻译