The ability to compose learned skills to solve new tasks is an important property of lifelong-learning agents. In this work, we formalise the logical composition of tasks as a Boolean algebra. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains---including a high-dimensional video game environment requiring function approximation---where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.
翻译:编造解决新任务所学技能的能力是终身学习代理人的重要属性。 在这项工作中,我们正式将任务的逻辑构成正规化为布林代数。 这使我们能够在否定、脱钩和组合一组基本任务方面制定新的任务。 然后我们通过学习面向目标的价值功能和限制任务的过渡动态来显示,一个代理人可以解决这些新任务,而无需进一步学习。 我们证明,通过以具体方式将这些价值函数组合在一起,我们立即恢复了在布林代数下可以表达的所有任务的最佳政策。 我们核查了我们在两个领域的方法,包括需要功能近似的高维视频游戏环境,一个代理人首先学习一套基础技能,然后把它们组合起来,解决超穷的新任务。