Deep reinforcement learning (DRL) has recently shown its success in tackling complex combinatorial optimization problems. When these problems are extended to multiobjective ones, it becomes difficult for the existing DRL approaches to flexibly and efficiently deal with multiple subproblems determined by weight decomposition of objectives. This paper proposes a concise meta-learning-based DRL approach. It first trains a meta-model by meta-learning. The meta-model is fine-tuned with a few update steps to derive submodels for the corresponding subproblems. The Pareto front is then built accordingly. Compared with other learning-based methods, our method can greatly shorten the training time of multiple submodels. Due to the rapid and excellent adaptability of the meta-model, more submodels can be derived so as to increase the quality and diversity of the found solutions. The computational experiments on multiobjective traveling salesman problems and multiobjective vehicle routing problem with time windows demonstrate the superiority of our method over most of learning-based and iteration-based approaches.
翻译:深度强化学习(DRL)最近展示了在解决复杂的组合优化问题方面的成功。当这些问题扩展到多重目标问题时,现有的DRL方法很难灵活和有效地处理由目标重量分解决定的多个子问题。本文件提出一个简明的基于元学习的DRL方法。它首先通过元学习来培训一个元模。元模型经过微调,用几个更新步骤来为相应的子问题制作子模型。随后,Pareto Front也相应地建立。与其他基于学习的方法相比,我们的方法可以大大缩短多个子模型的培训时间。由于元模型的快速和极佳的适应性,可以产生更多的子模型,以提高所发现的解决办法的质量和多样性。关于多目标旅行推销员问题和与时间窗口多目标车辆路由问题的计算实验表明,我们的方法优于大多数基于学习和循环的方法。