Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods
翻译:动作建议是一种基于教师-学生范式的强化学习知识传递技术。专家教师在训练过程中向学生提供建议,以提高学生的样本效率和策略性能。这种建议通常以状态-动作对的形式给出。然而,这使得学生难以理解,并难以应用于新颖的状态。我们引入了“可解释的动作建议”,其中教师提供的动作建议以及相关的说明说明了为什么选择该动作。这使得学生能够自我反思学到的东西,实现建议的泛化,从而提高样本效率和学习性能-即使在教师是次优的环境下也是如此。我们通过实验证明了我们的框架在单智能体和多智能体场景中都是有效的,相较于最先进的方法,它产生了更好的策略回报和收敛速度。