Multiobjective combinatorial optimization (MOCO) problems can be found in many real-world applications. However, exactly solving these problems would be very challenging, particularly when they are NP-hard. Many handcrafted heuristic methods have been proposed to tackle different MOCO problems over the past decades. In this work, we generalize the idea of neural combinatorial optimization, and develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure. We propose a single preference-conditioned model to directly generate approximate Pareto solutions for any trade-off preference, and design an efficient multiobjective reinforcement learning algorithm to train this model. Our proposed method can be treated as a learning-based extension for the widely-used decomposition-based multiobjective evolutionary algorithm (MOEA/D). It uses a single model to accommodate all the possible preferences, whereas other methods use a finite number of solutions to approximate the Pareto set. Experimental results show that our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiobjective vehicle routing problem, and multiobjective knapsack problem in terms of solution quality, speed, and model efficiency.
翻译:多目标组合优化(MOCO)问题在许多现实世界应用中都能找到。然而,确切地解决这些问题将是非常困难的,特别是当这些问题是NP硬的时。在过去几十年中,提出了许多手工制作的超光速方法来解决不同的MOCO问题。在这项工作中,我们推广了神经组合优化的概念,并开发了一种基于学习的方法来将Pareto为特定 MOCO 问题设定的全Pareto 匹配而无需进一步搜索程序。我们提出了一个单一的优惠条件模型,以直接产生近似Pareto 的偏好解决方案,并设计一个高效的多目标强化学习算法来培训这个模型。我们提议的方法可以被当作广泛使用的基于分解剖的多目标进化算法(MOEA/D)的学习扩展。我们使用一个单一模型来适应所有可能的偏好,而其他方法则使用一定数量的解决方案来接近Pareto 设定。实验结果显示,我们所提议的方法在多目标旅行人问题、多目标车辆路径问题、多目标速度和多目标Knapsack 问题中大大超越了其他方法。