Many sequential decision-making problems need optimization of different objectives which possibly conflict with each other. The conventional way to deal with a multi-task problem is to establish a scalar objective function based on a linear combination of different objectives. However, for the case of having conflicting objectives with different scales, this method needs a trial-and-error approach to properly find proper weights for the combination. As such, in most cases, this approach cannot guarantee an optimal Pareto solution. In this paper, we develop a single-agent scale-independent multi-objective reinforcement learning on the basis of the Advantage Actor-Critic (A2C) algorithm. A convergence analysis is then done for the devised multi-objective algorithm providing a convergence-in-mean guarantee. We then perform some experiments over a multi-task problem to evaluate the performance of the proposed algorithm. Simulation results show the superiority of developed multi-objective A2C approach against the single-objective algorithm.
翻译:许多相继决策问题需要优化不同目标,而不同目标之间可能相互冲突。处理多任务问题的常规方法是根据不同目标的线性组合,建立一个标量目标函数。然而,如果目标与不同规模相冲突,这种方法需要一种试验和操作方法,以适当找到组合的适当权重。因此,在大多数情况下,这种方法不能保证最佳的Pareto解决办法。在本文中,我们根据“A2C” 优势法,开发一种单一的、单一的、独立的、独立的、多目标的强化学习方法。然后,对设计出的多目标算法进行趋同分析,提供手段上的一致保证。然后,我们对一个多任务问题进行一些实验,以评价拟议的算法的性能。模拟结果显示,发达的多目标A2C方法优于单一目标算法。</s>