Reinforcement learning has seen increasing applications in real-world contexts over the past few years. However, physical environments are often imperfect and policies that perform well in simulation might not achieve the same performance when applied elsewhere. A common approach to combat this is to train agents in the presence of an adversary. An adversary acts to destabilise the agent, which learns a more robust policy and can better handle realistic conditions. Many real-world applications of reinforcement learning also make use of goal-conditioning: this is particularly useful in the context of robotics, as it allows the agent to act differently, depending on which goal is selected. Here, we focus on the problem of goal-conditioned learning in the presence of an adversary. We first present DigitFlip and CLEVR-Play, two novel goal-conditioned environments that support acting against an adversary. Next, we propose EHER and CHER -- two HER-based algorithms for goal-conditioned learning -- and evaluate their performance. Finally, we unify the two threads and introduce IGOAL: a novel framework for goal-conditioned learning in the presence of an adversary. Experimental results show that combining IGOAL with EHER allows agents to significantly outperform existing approaches, when acting against both random and competent adversaries.
翻译:过去几年来,在现实世界中,强化学习的运用在加强学习中不断增多。然而,物质环境往往不完善,在模拟中表现良好的政策在其他地方应用时可能无法取得同样的效果。一种共同的方法是训练对手在场的代理人。一种共同的方法是训练对手。一种对手的行为是破坏代理人的稳定,使代理人学会更强有力的政策,能够更好地处理现实的条件。许多应用强化学习的实际应用也利用了目标调节:这在机器人方面特别有用,因为它允许代理人根据选择的目标采取不同行动。这里,我们侧重于目标限制学习的问题,在对手在场的情况下,我们侧重于目标限制学习的问题。我们首先介绍DigitFlip和CLEVR-Play,这是两个新的、有目标限制的环境,支持对手采取行动。接下来,我们建议EHER和CHER -- -- 两个基于HER的基于目标限制学习的算法 -- -- 评估它们的表现。最后,我们统一了这两条线索并引入了IGOAL:一个在对抗面前有目标限制学习的新框架。实验结果显示IGOAL与EOR Agresent 两种方法都能够大大结合IGOLing 和IGOL 两种方法。