Recent advances in deep Reinforcement Learning (RL) have created unprecedented opportunities for intelligent automation, where a machine can autonomously learn an optimal policy for performing a given task. However, current deep RL algorithms predominantly specialize in a narrow range of tasks, are sample inefficient, and lack sufficient stability, which in turn hinder their industrial adoption. This article tackles this limitation by developing and testing a Hyper-Actor Soft Actor-Critic (HASAC) RL framework based on the notions of task modularization and transfer learning. The goal of the proposed HASAC is to enhance the adaptability of an agent to new tasks by transferring the learned policies of former tasks to the new task via a "hyper-actor". The HASAC framework is tested on a new virtual robotic manipulation benchmark, Meta-World. Numerical experiments show superior performance by HASAC over state-of-the-art deep RL algorithms in terms of reward value, success rate, and task completion time.
翻译:深入强化学习(RL)最近的进展为智能自动化创造了前所未有的机会,使机器能够自主地学习执行某项特定任务的最佳政策。然而,目前的深入的RL算法主要专门从事范围狭窄的任务,其抽样效率低,缺乏足够的稳定性,这反过来又阻碍其工业的采用。这一条通过根据任务模块化和转让学习的概念制定和测试超Actor Soft Acor-Crict(HASAC) RL框架来解决这一限制。拟议的HASAC的目标是通过“超强者”将以前的任务的学习政策转移到新任务,从而提高代理人对新任务的适应性。HASAC框架在新的虚拟机器人操纵基准“Meta-World”上进行了测试。数字实验显示,HASAC在奖励价值、成功率和任务完成时间方面对最先进的高级RL算法表现优。