以自动课程学习为目的的有内在动力的探索进程 (Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning)

Intrinsically motivated spontaneous exploration is a key enabler of autonomous developmental learning in human children. It enables the discovery of skill repertoires through autotelic learning, i.e. the self-generation, self-selection, self-ordering and self-experimentation of learning goals. We present an algorithmic approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) to enable similar properties of autonomous learning in machines. The IMGEP architecture relies on several principles: 1) self-generation of goals, generalized as parameterized fitness functions; 2) selection of goals based on intrinsic rewards; 3) exploration with incremental goal-parameterized policy search and exploitation with a batch learning algorithm; 4) systematic reuse of information acquired when targeting a goal for improving towards other goals. We present a particularly efficient form of IMGEP, called AMB, that uses a population-based policy and an object-centered spatio-temporal modularity. We provide several implementations of this architecture and demonstrate their ability to automatically generate a learning curriculum within several experimental setups. One of these experiments includes a real humanoid robot exploring multiple spaces of goals with several hundred continuous dimensions and with distractors. While no particular target goal is provided to these autotelic agents, this curriculum allows the discovery of diverse skills that act as stepping stones for learning more complex skills, e.g. nested tool use.

翻译：自然动机自发探索是人类儿童自主发展学习的关键促进因素,通过自发学习,即自我培养、自我选择、自定和自我探索学习目标,能够发现技能,通过自发学习发现技能,通过自发学习发现技能,即学习目标的自我培养、自我选择、自定和自我探索。我们提出了一种叫作 " 内在动力目标探索进程 " (IMGEP)的算法方法,使机器中自主学习的类似特性得以实现。IMGEP建筑依赖若干原则:(1) 目标的自我生成,作为参数化的健身功能;(2) 以内在奖励为基础选择目标;(3) 利用分批学习算法,进行渐进目标比分目标分化的政策搜索和利用;(4) 系统地重新利用在针对目标改进其他目标的目标时获得的信息。我们提出了一种特别有效的方法,称为AMBEP(IMB),采用基于人口的政策和以对象为主的时时时时空模块。我们提供若干执行这一结构的情况,并表明它们有能力在一些实验性设置中自动产生学习课程。这些实验中的一项包括实际人类成目标的分量度的政策搜索目标搜索搜索和利用多脚步式机器人的模型,同时又使这些学习工具能够持续地探索的多步步步步步步式,使这些步步进进进进进的步进进进的阶,使这些步进进进进进进的阶,使这些步式机器人成为了这些步进进的阶,使这些步进进的阶,使这些工具成为了这些步进进进进进进的阶工具用于工具用于工具,使这些步进进进进进进进进进进进进进进进进进进进进进。