Humans are capable of abstracting various tasks as different combinations of multiple attributes. This perspective of compositionality is vital for human rapid learning and adaption since previous experiences from related tasks can be combined to generalize across novel compositional settings. In this work, we aim to achieve zero-shot policy generalization of Reinforcement Learning (RL) agents by leveraging the task compositionality. Our proposed method is a meta- RL algorithm with disentangled task representation, explicitly encoding different aspects of the tasks. Policy generalization is then performed by inferring unseen compositional task representations via the obtained disentanglement without extra exploration. The evaluation is conducted on three simulated tasks and a challenging real-world robotic insertion task. Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.
翻译:人类能够将各种任务抽取为多种属性的不同组合。 这种构成性观点对于人类快速学习和适应至关重要,因为以往相关任务的经验可以结合到新的构成环境中加以概括。 在这项工作中,我们的目标是通过利用任务构成性实现加强学习代理的零弹政策概括化。我们提议的方法是一种元-RL算法,具有分解的任务代表,明确将任务的不同方面编码。然后,政策一般化是通过获得的解析而无需额外探索而推断出看不见的构成性任务表述。评价针对三种模拟任务和具有挑战性的现实世界机器人插入任务进行。实验结果表明,我们拟议的方法实现了政策概括化,以零弹射方式将无形的构成任务转化为无形的构成任务。