The ability to leverage shared behaviors between tasks is critical for sample-efficient multi-task reinforcement learning (MTRL). While prior methods have primarily explored parameter and data sharing, direct behavior-sharing has been limited to task families requiring similar behaviors. Our goal is to extend the efficacy of behavior-sharing to more general task families that could require a mix of shareable and conflicting behaviors. Our key insight is an agent's behavior across tasks can be used for mutually beneficial exploration. To this end, we propose a simple MTRL framework for identifying shareable behaviors over tasks and incorporating them to guide exploration. We empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation MTRL tasks and is even complementary to parameter sharing. Result videos are available at https://sites.google.com/view/qmp-mtrl.
翻译:利用不同任务之间共同行为的能力对于抽样高效的多任务强化学习(MTRL)至关重要。 虽然先前的方法主要探索了参数和数据共享,但直接行为共享仅限于需要类似行为的家庭。 我们的目标是将行为共享的效力扩大到更一般的任务家庭,这可能需要共同分担和相互冲突的行为组合。 我们的关键洞察力是代理人跨任务的行为可以用于互利的探索。 为此,我们提议了一个简单的MDL框架,用于确定对任务共享的行为,并纳入它们来指导探索。 我们从经验上展示了行为共享如何提高操作和导航MDGL任务的样本效率和最终性能,甚至与参数共享相辅相成。 结果视频可在 https://sites.google.com/view/qmp-mtrl上查阅。