We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the energy consumption of the pair. We approach the problem by means of Multi Objective Reinforcement Learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, $\tau$. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where Reinforcement Learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller $\tau$ all a priori heuristic strategies become Pareto optimal.
翻译:我们考虑了2D复杂流中两种活性粒子的问题,其多重目标目标是将分散率和能量消耗最小化。我们通过多目标强化学习(MORL)来解决这个问题,将扩缩技术与Q学习算法相结合,对于游速不一的拉格朗吉漂流者来说,我们表明,摩洛能够找到一套折中解决方案,形成最佳的Pareto边界。作为一个基准,我们显示一套超常策略由摩洛解决方案主导。我们考虑到在多目标强化学习(MORL)之后,代理商无法不断更新其控制变量的情况,但只是在离散(决定)时间($\tau$)之后。我们表明,在Lyapunov时间和持续更新限制之间,在Lyapunov时间之间有一系列决策时间,在这种时间里,加强学习发现战略大大改进了超高温度边界。我们讨论的是,决定时间的长短要求如何增加对流动的了解,而对于较小的美元则先期的超值战略变得最理想。