While humans and animals learn incrementally during their lifetimes and exploit their experience to solve new tasks, standard deep reinforcement learning methods specialize to solve only one task at a time and, as a result, the information they acquire is hardly reusable in new situations. Here, we introduce a new perspective on the problem of leveraging prior knowledge to solve future unknown tasks. We show that learning discrete concept-like representations of sensory inputs can provide a high-level abstraction that is common across multiple tasks, thus facilitating the transference of information. In particular, we show that it is possible to learn such representations by self-supervision, following an information theoretic approach, and that they improve the sample efficiency by providing prior policies that guide the policy learning process. Our method is able to learn concepts in locomotive tasks that reduce the number of optimization steps in both known and unknown tasks, opening a new path to endow artificial agents with generalization abilities.
翻译:虽然人类和动物在其一生中不断学习,并利用其经验解决新任务,但标准的深层强化学习方法专门一次只解决一项任务,因此,他们获得的信息在新情况下几乎无法再使用。在这里,我们从新角度探讨利用先前知识解决未来未知任务的问题。我们表明,学习不同概念的感官投入表达方式可以提供高层次的抽象,这在多重任务中是共同的,从而便利信息的转移。特别是,我们表明,通过自我监督,通过信息理论方法,可以了解这种表述方式,并且通过提供指导政策学习过程的先前政策,提高抽样效率。我们的方法能够学习机车式任务的概念,减少已知和未知任务中的优化步骤数量,开辟一条新的路径,让具有普及能力的人工剂进入。