自主、不自定和不自定相互依存性任务不限名额学习 (Autonomous Open-Ended Learning of Tasks with Non-Stationary Interdependencies)

Autonomous open-ended learning is a relevant approach in machine learning and robotics, allowing the design of artificial agents able to acquire goals and motor skills without the necessity of user assigned tasks. A crucial issue for this approach is to develop strategies to ensure that agents can maximise their competence on as many tasks as possible in the shortest possible time. Intrinsic motivations have proven to generate a task-agnostic signal to properly allocate the training time amongst goals. While the majority of works in the field of intrinsically motivated open-ended learning focus on scenarios where goals are independent from each other, only few of them studied the autonomous acquisition of interdependent tasks, and even fewer tackled scenarios where goals involve non-stationary interdependencies. Building on previous works, we tackle these crucial issues at the level of decision making (i.e., building strategies to properly select between goals), and we propose a hierarchical architecture that treating sub-tasks selection as a Markov Decision Process is able to properly learn interdependent skills on the basis of intrinsically generated motivations. In particular, we first deepen the analysis of a previous system, showing the importance of incorporating information about the relationships between tasks at a higher level of the architecture (that of goal selection). Then we introduce H-GRAIL, a new system that extends the previous one by adding a new learning layer to store the autonomously acquired sequences of tasks to be able to modify them in case the interdependencies are non-stationary. All systems are tested in a real robotic scenario, with a Baxter robot performing multiple interdependent reaching tasks.

翻译：自主开放学习是机器学习和机器人学习中的一种相关方法,使得设计能够获得目标和机动技能的人工代理人员无需用户指派任务即可获得目标和机动技能,这一方法的一个关键问题是制定战略,确保代理人员能够在尽可能短的时间内最大限度地发挥自己在尽可能多的任务方面的能力。自然动机证明可以产生一个任务-不可知的信号,在目标之间适当分配培训时间。虽然在目标相互独立的情景上具有内在动机的开放式学习焦点领域,大多数工作都能够恰当地学习各种技能,但只有少数人研究自主获得相互依存的任务,甚至更少地研究在目标涉及非固定相互依存的情景下处理的情景。在以往工作的基础上,我们在决策一级解决这些关键问题(即建立战略,在目标之间作出适当选择),我们提出了一个等级结构,将次级任务选择作为Markov 决策程序,能够在内在动机的基础上适当学习相互依存的技能。特别是,我们首先深化对前一个系统的分析,表明将有关任务中的非信息纳入不固定的相互依存关系的重要性,在前一个工作层次上,我们在一个更高的层次上,将一个新任务引入一个升级的跨级结构。