以决定导向的学习,为车辆运行问题提供有区别的子模块最大化</s> (Decision-Oriented Learning with Differentiable Submodular Maximization for Vehicle Routing Problem)

We study the problem of learning a function that maps context observations (input) to parameters of a submodular function (output). Our motivating case study is a specific type of vehicle routing problem, in which a team of Unmanned Ground Vehicles (UGVs) can serve as mobile charging stations to recharge a team of Unmanned Ground Vehicles (UAVs) that execute persistent monitoring tasks. {We want to learn the mapping from observations of UAV task routes and wind field to the parameters of a submodular objective function, which describes the distribution of landing positions of the UAVs .} Traditionally, such a learning problem is solved independently as a prediction phase without considering the downstream task optimization phase. However, the loss function used in prediction may be misaligned with our final goal, i.e., a good routing decision. Good performance in the isolated prediction phase does not necessarily lead to good decisions in the downstream routing task. In this paper, we propose a framework that incorporates task optimization as a differentiable layer in the prediction phase. Our framework allows end-to-end training of the prediction model without using engineered intermediate loss that is targeted only at the prediction performance. In the proposed framework, task optimization (submodular maximization) is made differentiable by introducing stochastic perturbations into deterministic algorithms (i.e., stochastic smoothing). We demonstrate the efficacy of the proposed framework using synthetic data. Experimental results of the mobile charging station routing problem show that the proposed framework can result in better routing decisions, e.g. the average number of UAVs recharged increases, compared to the prediction-optimization separate approach.

翻译：我们的研究问题是,学习一个功能,将背景观测(投入)映射成亚模块函数(输出)的参数。我们的激励性案例研究是一个特定类型的车辆路由问题,在这种函数中,无人驾驶地面车辆小组(UGVs)可以充当流动充电站,为执行持续监测任务的无人驾驶地面车辆小组(UAVs)提供补给。 {我们想从对UAV任务路线和风场的观测中,了解一个亚模块目标功能的参数,该功能描述UAVs着陆位置的分布。}传统上,这种学习问题作为一个预测阶段独立解决,而不考虑下游任务优化阶段。然而,预测中使用的损失功能可能与我们的最终目标不相符,即良好的路由选择决定。孤立的预测阶段的绩效不一定导致下游路运行任务的良好决策。在本文中,我们提议一个框架,将任务优化作为可区别的层次框架。}我们的框架允许对预测模型进行端到端培训,而无需使用优化的中度决定,而只是通过优化的中间值预测显示目标性业绩。</s>