Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviors autonomously. Deep Interactive Reinforcement Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choosing actions to speed up the learning process. However, current research has been limited to interactions that offer actionable advice to only the current state of the agent. Additionally, the information is discarded by the agent after a single use that causes a duplicate process at the same state for a revisit. In this paper, we present Broad-persistent Advising (BPA), a broad-persistent advising approach that retains and reuses the processed information. It not only helps trainers to give more general advice relevant to similar states instead of only the current state but also allows the agent to speed up the learning process. We test the proposed approach in two continuous robotic scenarios, namely, a cart pole balancing task and a simulated robot navigation task. The obtained results show that the performance of the agent using BPA improves while keeping the number of interactions required for the trainer in comparison to the DeepIRL approach.
翻译:深度强化学习(DeepRL)方法在机器人中被广泛使用,以了解环境并自主获取行为。深度互动强化学习(DeepIRL)包括外部培训者或专家的互动反馈,他们提供咨询,帮助学习者选择加速学习过程的行动。然而,目前的研究仅限于只为代理商的当前状态提供可操作建议的互动。此外,在一次性使用后,信息被代理商丢弃,导致同一状态的重复进程进行重访。在本文中,我们介绍了宽度咨询(BPA),这是一种保留和再利用经处理的信息的宽度咨询方法。它不仅帮助培训者提供与类似国家有关的更一般的咨询,而不只是当前的状态,而且还使代理商能够加快学习过程。我们在两种连续的机器人假设中测试了拟议的方法,即一个木杆平衡任务和一个模拟机器人导航任务。获得的结果表明,使用BPA的代理商的性能有所改善,同时保持与深海IRL方法相比较所需的互动次数。