We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors - from scratch - in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.
翻译:我们提议在加强学习(RL)方面采用一个新的学习模式,即计划辅助控制(SAC-X),这是加强学习(RL)的新模式。SAC-X使得能够从零开始学习复杂的行为,而从零开始,同时看到多种微薄的奖励信号。为此,该代理人配备了一套一般性辅助任务,试图通过非政策性辅助任务同时学习。 我们的方法背后的关键思想是,积极(学习)安排和执行辅助政策,使代理人能够有效地探索其环境,使其能够精益求精地获得微薄的奖励。 我们在几个具有挑战性的机器人操纵环境中的实验证明了我们的方法的力量。