We consider a model of two competing microswimming agents engaged in a pursue-evasion task within a low-Reynolds-number environment. Agents can only perform simple manoeuvres and sense hydrodynamic disturbances, which provide ambiguous (partial) information about the opponent's position and motion. We frame the problem as a zero-sum game: the pursuer has to capture the evader in the shortest time, while the evader aims at deferring capture as long as possible. We show that the agents, trained via Adversarial Reinforcement Learning, are able to overcome partial-observability by discovering increasingly complex sequences of moves and countermoves that outperform known heuristic strategies and exploit the hydrodynamic environment.
翻译:我们认为,在低Reynolds数量的环境中,两种相互竞争的微缩滚动剂在低Reynolds数量的环境中从事追逐逃避任务,这是一种模式。代理人只能进行简单的动作和感知流体动力扰动,提供关于对手立场和运动的模糊(部分)信息。我们把问题看成是零和游戏:追逐者必须在最短的时间内抓住躲避者,而逃避者的目标是尽可能推迟捕捉。我们证明通过Adversarial Estruction Learning(Adversarial Information Learning)培训的代理人能够通过发现日益复杂的动作和反运动序列,超越已知的超常策略并利用流体动力环境,克服部分可耐性。