Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.
翻译:强化学习(RL)在通过使用深层神经网络解决许多挑战性任务方面表现出了巨大的成功。虽然运用深入的学习为RL带来巨大的代表性力量,但它也造成了众所周知的抽样效率低下问题。这意味着算法是数据饥饿的,需要数百万个培训样本才能形成适当的政策。 解决这一问题的一种方法是在教师-学生框架内使用行动咨询,由一位知识丰富的教师提供行动建议以帮助学生。 这项工作考虑了如何更好地利用不确定性,即一个学生何时应该征求建议,以及学生是否可以模拟教师要求较少的建议。 学生可以在不确定时或当学生及其教师模式不确定时决定征求咨询意见。 除这项调查外,本文还引入了一种新的方法,用二级神经网络为深层RL代理计算不确定性。我们的经验结果表明,使用双重不确定性来推动建议收集和再利用,可以改善阿塔里几个游戏的学习表现。