与基地主计长一道,实现长期Horizon Sparass-resward机器人任务抽样有效学习 (Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers) - 专知论文

会员服务 ·

0

学成 · 控制器 · 确定性策略 · 机器人 · 基 ·

2021 年 3 月 23 日

Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers

翻译：与基地主计长一道,实现长期Horizon Sparass-resward机器人任务抽样有效学习

Minjian Xin,Guangming Wang,Zhe Liu,Hesheng Wang

from arxiv, 7 pages, 5 figures, under review

Application of Deep Reinforcement Learning (DRL) algorithms in robotic tasks faces many challenges. On the one hand, reward-shaping for complex tasks that involve multiple sequences is difficult and may result in sub-optimal performances. On the other hand, a sparse-reward setting renders exploration inefficient, and exploration using physical robots is of high-cost and unsafe. In this paper we propose a method of learning long-horizon sparse-reward tasks utilizing one or more existing controllers. Built upon Deep Deterministic Policy Gradients (DDPG), our algorithm incorporates the controllers into stages of exploration, policy update, and most importantly, learning a heuristic value function that naturally interpolates along task trajectories. Through experiments ranging from stacking blocks to cups, we present a straightforward way of synthesizing these controllers, and show that the learned state-based or image-based policies steadily outperform them. Compared to previous works of learning from demonstrations, our method improves sample efficiency by orders of magnitude. Overall, our method bears the potential of leveraging existing industrial robot manipulation systems to build more flexible and intelligent controllers.

翻译：在机器人任务中应用深强化学习(DRL)算法面临许多挑战。一方面,为涉及多个序列的复杂任务打分是困难的,可能会导致亚最佳性能。另一方面,由于奖励程度低,勘探效率低,使用物理机器人的探索成本高且不安全。在本文件中,我们提出了一个方法,利用一个或多个现有控制器学习长期和高度分散的奖励任务。在深确定性政策梯度(DDPG)上建立起来,我们的算法将控制器纳入探索、政策更新阶段,最重要的是,学习自然在任务轨迹上相互交错的超值功能。通过从堆叠块到杯子的实验,我们展示了将这些控制器合成的直截了当的方法,并表明以状态或图像为基础的政策稳步地超越了这些控制器。与以往的从演示中学习的工程相比,我们的方法提高了抽样效率。总的来说,我们的方法具有利用现有工业机器人操纵系统来建立更灵活和智能控制器的潜力。

0

相关内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems

Arxiv

0+阅读 · 2021年5月17日

Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning

Arxiv

0+阅读 · 2021年5月16日

Model-Based Offline Planning with Trajectory Pruning

Arxiv

0+阅读 · 2021年5月16日

Active Model Learning using Informative Trajectories for Improved Closed-Loop Control on Real Robots

Arxiv

0+阅读 · 2021年5月14日

Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration

Arxiv

0+阅读 · 2021年5月13日

An Open-Source Multi-Goal Reinforcement Learning Environment for Robotic Manipulation with Pybullet

Arxiv

0+阅读 · 2021年5月12日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Arxiv

3+阅读 · 2018年7月6日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

确定性策略

相关VIP内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

[ICCV2025]EAMamba：面向图像恢复的高效全能视觉状态空间模型

ICCV 2025 | 超越π0，无界智慧提出A0，首个空间可供性感知的通用操作模型

【博士论文】大规模人工智能中的强化学习智能体：高效训练与更严谨分析

大语言模型推理系统综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems

Arxiv

0+阅读 · 2021年5月17日

Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning

Arxiv

0+阅读 · 2021年5月16日

Model-Based Offline Planning with Trajectory Pruning

Arxiv

0+阅读 · 2021年5月16日

Active Model Learning using Informative Trajectories for Improved Closed-Loop Control on Real Robots

Arxiv

0+阅读 · 2021年5月14日

Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration

Arxiv

0+阅读 · 2021年5月13日

An Open-Source Multi-Goal Reinforcement Learning Environment for Robotic Manipulation with Pybullet

Arxiv

0+阅读 · 2021年5月12日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Arxiv

3+阅读 · 2018年7月6日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员