Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods.
翻译:行动咨询是一种知识转让技术,用于根据师生范范式加强学习; 专家教师在培训期间向学生提供咨询,以提高学生的抽样效率和政策绩效; 此类建议通常以州际行动形式提供; 然而,这使得学生难以与新邦解释和适用; 我们引入了可解释行动咨询,教师在其中提供行动建议和相关解释,说明选择行动的原因; 这使学生能够自我反思所学到的知识,促成咨询的普及,并导致提高抽样效率和学习绩效,即使在教师不理想的环境中也是如此; 我们从经验上表明,我们的框架在单一机构和多机构两种情况下都有效,在与最新方法相比,可以产生更好的政策回报和趋同率。