深强化学习,分阶段鼓励机制,鼓励对机器人轨迹规划的高度回报机制 (Deep Reinforcement Learning with a Stage Incentive Mechanism of Dense Reward for Robotic Trajectory Planning) - 专知论文

会员服务 ·

0

奖励函数 · 泛函 · 学成 · Processing（编程语言） · SOFT ·

2021 年 5 月 23 日

Deep Reinforcement Learning with a Stage Incentive Mechanism of Dense Reward for Robotic Trajectory Planning

翻译：深强化学习,分阶段鼓励机制,鼓励对机器人轨迹规划的高度回报机制

Gang Peng,Jin Yang,Xinde Lia,Mohammad Omar Khyam

(This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.) To improve the efficiency of deep reinforcement learning (DRL)-based methods for robot manipulator trajectory planning in random working environments, we present three dense reward functions. These rewards differ from the traditional sparse reward. First, a posture reward function is proposed to speed up the learning process with a more reasonable trajectory by modeling the distance and direction constraints, which can reduce the blindness of exploration. Second, a stride reward function is proposed to improve the stability of the learning process by modeling the distance and movement distance of joint constraints. Finally, in order to further improve learning efficiency, we are inspired by the cognitive process of human behavior and propose a stage incentive mechanism, including a hard stage incentive reward function and a soft stage incentive reward function. Extensive experiments show that the soft stage incentive reward function is able to improve the convergence rate by up to 46.9% with the state-of-the-art DRL methods. The percentage increase in the convergence mean reward was 4.4-15.5% and the percentage decreases with respect to standard deviation were 21.9-63.2%. In the evaluation experiments, the success rate of trajectory planning for a robot manipulator reached 99.6%.

翻译：(这项工作已提交IEEEE, 供可能出版。版权可以不经通知转让, 之后本版本可能无法再进入。 )为了提高在随机工作环境中以机器人操纵者轨迹规划为基础的深强化学习( DRL)方法的效率,我们提出了三种密集的奖赏功能。这些奖赏与传统的稀有奖励不同。首先, 提议一个姿态奖励功能, 以更合理的轨迹加快学习过程, 以模拟距离和方向限制, 从而降低勘探的失明程度。其次, 提议一个跳跃奖励功能, 通过模拟联合限制的距离和移动距离来提高学习过程的稳定性。最后, 为了进一步提高学习效率, 我们受到人类行为认知过程的启发, 并提出一个阶段奖励机制, 包括硬阶段奖励功能和软阶段奖励功能。广泛的实验表明, 软阶段奖励功能能够提高趋同率, 达到46.9%, 从而降低探索的失明程度。趋同率提高的百分比是4.-15.5 %, 与标准飞行轨迹成功率降低的百分比是21.9-6. 。

0

相关内容

奖励函数

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Robust and Recursively Feasible Real-Time Trajectory Planning in Unknown Environments

Arxiv

0+阅读 · 2021年7月14日

Motion Planning by Learning the Solution Manifold in Trajectory Optimization

Arxiv

0+阅读 · 2021年7月13日

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Arxiv

0+阅读 · 2021年7月12日

Distributed Deep Reinforcement Learning for Intelligent Traffic Monitoring with a Team of Aerial Robots

Arxiv

0+阅读 · 2021年7月10日

Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios

Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios

Arxiv

0+阅读 · 2021年7月9日

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

Arxiv

0+阅读 · 2021年7月9日

Planning of efficient trajectories in robotized assembly of aerostructures exploiting kinematic redundancy

Arxiv

0+阅读 · 2021年7月9日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【斯坦福大学】Gradient Surgery for Multi-Task Learning

【斯坦福大学】Gradient Surgery for Multi-Task Learning

专知会员服务

47+阅读 · 2020年1月23日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Robust and Recursively Feasible Real-Time Trajectory Planning in Unknown Environments

Arxiv

0+阅读 · 2021年7月14日

Motion Planning by Learning the Solution Manifold in Trajectory Optimization

Arxiv

0+阅读 · 2021年7月13日

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Arxiv

0+阅读 · 2021年7月12日

Distributed Deep Reinforcement Learning for Intelligent Traffic Monitoring with a Team of Aerial Robots

Arxiv

0+阅读 · 2021年7月10日

Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios

Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios

Arxiv

0+阅读 · 2021年7月9日

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

Arxiv

0+阅读 · 2021年7月9日

Planning of efficient trajectories in robotized assembly of aerostructures exploiting kinematic redundancy

Arxiv

0+阅读 · 2021年7月9日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

Arxiv

6+阅读 · 2018年6月7日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员