具有内在动力驱动目标-有条件强化学习:短期调查 (Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey)

Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by $autotelic$ $agents$: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: $developmental$ $reinforcement$ $learning$. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the $intrinsically$ $motivated$ $acquisition$ $of$ $open$-$ended$ $repertoires$ $of$ $skills$. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.

翻译：建立自主的机器,探索开放的环境,发现可能的相互作用,并建立技能库,这是人工智能的总目标。发展方法认为,这只能通过下述方式来实现:用美元作为人工智能的总目标:利用深RL算法来解决发展问题 -- -- 以美元为单位的美元为单位的美元:具有内在动机的学习机构,能够学会代表、产生、选择和解决自己的问题。近年来,发展方法与深层强化学习方法相结合,导致出现一个新的领域:开发美元,以美元为单位的学习费用。发展RL算法关注的是利用深层RL算法解决发展问题 -- -- 以美元为单位的美元为单位的购买费用:以美元为单位的美元为单位的购买费用,以美元为单位的美元为单位的购买费用,以美元为单位的购买费用为单位的购买费用。自我创造的目标需要学习紧凑目标的编码以及相关的实现目标的实现功能。与最初设计用于利用外部奖励信号解决预定目标的一组目标的标准RL算法相比,这带来了新的挑战。本文件介绍发展RL,并提议一个密切的计算框架,其基础是基于有目的的购买力的购买技能的学习方法,以最终的学习现有方法,以研究各种的学习方法,通过现有学习方法,以研究各种的学习方法,以最后的学习方法处理现有学习方法,以研究方法处理现有学习各种的学习方法的学习方法,以研究。