We study the problem of learning the preferences of drivers and planners in the context of last mile delivery. Given a data set containing historical decisions and delivery locations, the goal is to capture the implicit preferences of the decision-makers. We consider two ways to use the historical data: one is through a probability estimation method that learns transition probabilities between stops (or zones). This is a fast and accurate method, recently studied in a VRP setting. Furthermore, we explore the use of machine learning to infer how to best balance multiple objectives such as distance, probability and penalties. Specifically, we cast the learning problem as a structured output prediction problem, where training is done by repeatedly calling the TSP solver. Another important aspect we consider is that for last-mile delivery, every address is a potential client and hence the data is very sparse. Hence, we propose a two-stage approach that first learns preferences at the zone level in order to compute a zone routing; after which a penalty-based TSP computes the stop routing. Results show that the zone transition probability estimation performs well, and that the structured output prediction learning can improve the results further. We hence showcase a successful combination of both probability estimation and machine learning, all the while using standard TSP solvers, both during learning and to compute the final solution; this means the methodology is applicable to other, real-life, TSP variants, or proprietary solvers.
翻译:我们研究的是学习驾驶员和规划者在最后一英里交付过程中的偏好的问题。鉴于包含历史决定和交付地点的数据集,我们的目标是捕捉决策者的隐含偏好。我们考虑使用历史数据的两种方法:一种是概率估计方法,学习在停止(或区域)之间的过渡概率;这是一种快速和准确的方法,最近在甚远项目环境中研究过。此外,我们探索机器学习如何推断如何最好地平衡距离、概率和惩罚等多重目标。具体地,我们把学习问题作为一个结构化的产出预测问题,通过反复调用 TSP 解答器进行培训。我们考虑的另一个重要方面是,对于最后一英里交付而言,每个地址都是潜在的客户,因此数据非常稀少。因此,我们提出一个两阶段方法,首先在区一级学习偏好,然后是划线;之后,基于惩罚的TSP 计算停止路程。结果显示,区域过渡概率估计进行得当,而结构化的产出预测可以改进结果,而结构化的学习方法则在最后选择方法期间,我们展示了整个机器学习成功组合的方法,同时学习其他的概率和选择方法。