Imitation learning is a primary approach to improve the efficiency of reinforcement learning by exploiting the expert demonstrations. However, in many real scenarios, obtaining expert demonstrations could be extremely expensive or even impossible. To overcome this challenge, in this paper, we propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents themselves without expert demonstration. Specifically, we train two different agents via letting each of them alternately explore the environment and exploit the peer agent's experience. While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function. Thus the agents can selectively imitate from each other by emphasizing the more useful experiences while filtering out noisy ones. Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework, validating that the agents can benefit from each other without external supervision.
翻译:光学学习是通过利用专家示范提高强化学习效率的主要方法。然而,在许多实际情况下,获得专家示范可能是极其昂贵甚至不可能的。为了克服这一挑战,我们在本文件中提议了一个名为“联合吸收学习”的新颖学习框架(CoIL),以利用代理人本身过去的良好经验,而无需专家示范。具体地说,我们培训了两个不同的代理人,让他们各自轮流探索环境,利用同行代理人的经验。虽然这些经验可能是有价值的或误导性的,但我们提议估计每种经验对预期的增值收益的潜在效用。因此,代理人可以有选择地相互模仿,强调更有用的经验,同时过滤吵闹的经验。各项任务的实验结果显示了拟议的共同吸收学习框架的显著优势,证明这些代理人在没有外部监督的情况下可以相互受益。