The travelling salesperson problem (TSP) is a classic resource allocation problem used to find an optimal order of doing a set of tasks while minimizing (or maximizing) an associated objective function. It is widely used in robotics for applications such as planning and scheduling. In this work, we solve TSP for two objectives using reinforcement learning (RL). Often in multi-objective optimization problems, the associated objective functions can be conflicting in nature. In such cases, the optimality is defined in terms of Pareto optimality. A set of these Pareto optimal solutions in the objective space form a Pareto front (or frontier). Each solution has its trade-off. We present the Pareto frontier approximation network (PA-Net), a network that generates good approximations of the Pareto front for the bi-objective travelling salesperson problem (BTSP). Firstly, BTSP is converted into a constrained optimization problem. We then train our network to solve this constrained problem using the Lagrangian relaxation and policy gradient. With PA-Net we improve the performance over an existing deep RL-based method. The average improvement in the hypervolume metric, which is used to measure the optimality of the Pareto front, is 2.3%. At the same time, PA-Net has 4.5x faster inference time. Finally, we present the application of PA-Net to find optimal visiting order in a robotic navigation task/coverage planning. Our code is available on the project website.
翻译:旅行销售人员问题(TSP)是一个典型的资源分配问题,用于寻找在尽量减少(或尽量扩大)一个相关目标功能的同时执行一系列任务的最佳顺序。它被广泛用于机器人,用于规划和时间安排等应用。在这项工作中,我们用强化学习(RL)解决了两个目标的TSP。在多目标优化问题中,相关的目标功能在性质上可能相互冲突。在这种情况下,最佳性能以Pareto最佳性界定。在目标空间中,一套最佳的Pareto解决方案构成一个Pareto前沿(或前沿)。每个解决方案都有其交易。我们介绍了Pareto边境近似网络(PA-Net),这是一个为双目标流动销售人员问题(BTSP)产生良好近似效果的网络。首先,BTSP转换成一个有限的优化问题。然后,我们用Lagranangian的简化和政策梯度来培训我们的网络,用PA-Net改进了现有深RL方法的性能。我们当前超时空度度度度测量的平均改进度度度测量,在目前Paretoroomal Pro-to the roomalalal rout routal-to the rout thestal to thestal rout thestal rout thestal rout thestital rout thestal rout thestal lavilvilvilvioldal bedaldaldal be lapal to thestaldaldaldaldaldaldaldaldaldaldaldaldaldaldald.