采用模块式深强化学习和政策转让的可调适自动化 (Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer)

Recent advances in deep Reinforcement Learning (RL) have created unprecedented opportunities for intelligent automation, where a machine can autonomously learn an optimal policy for performing a given task. However, current deep RL algorithms predominantly specialize in a narrow range of tasks, are sample inefficient, and lack sufficient stability, which in turn hinder their industrial adoption. This article tackles this limitation by developing and testing a Hyper-Actor Soft Actor-Critic (HASAC) RL framework based on the notions of task modularization and transfer learning. The goal of the proposed HASAC is to enhance the adaptability of an agent to new tasks by transferring the learned policies of former tasks to the new task via a "hyper-actor". The HASAC framework is tested on a new virtual robotic manipulation benchmark, Meta-World. Numerical experiments show superior performance by HASAC over state-of-the-art deep RL algorithms in terms of reward value, success rate, and task completion time.

翻译：深入强化学习(RL)最近的进展为智能自动化创造了前所未有的机会,使机器能够自主地学习执行某项特定任务的最佳政策。然而,目前的深入的RL算法主要专门从事范围狭窄的任务,其抽样效率低,缺乏足够的稳定性,这反过来又阻碍其工业的采用。这一条通过根据任务模块化和转让学习的概念制定和测试超Actor Soft Acor-Crict(HASAC) RL框架来解决这一限制。拟议的HASAC的目标是通过“超强者”将以前的任务的学习政策转移到新任务,从而提高代理人对新任务的适应性。HASAC框架在新的虚拟机器人操纵基准“Meta-World”上进行了测试。数字实验显示,HASAC在奖励价值、成功率和任务完成时间方面对最先进的高级RL算法表现优。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日