模型预测动动动-批评:加速与深强化学习一起获取机器人技能 (Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning)

Substantial advancements to model-based reinforcement learning algorithms have been impeded by the model-bias induced by the collected data, which generally hurts performance. Meanwhile, their inherent sample efficiency warrants utility for most robot applications, limiting potential damage to the robot and its environment during training. Inspired by information theoretic model predictive control and advances in deep reinforcement learning, we introduce Model Predictive Actor-Critic (MoPAC), a hybrid model-based/model-free method that combines model predictive rollouts with policy optimization as to mitigate model bias. MoPAC leverages optimal trajectories to guide policy learning, but explores via its model-free method, allowing the algorithm to learn more expressive dynamics models. This combination guarantees optimal skill learning up to an approximation error and reduces necessary physical interaction with the environment, making it suitable for real-robot training. We provide extensive results showcasing how our proposed method generally outperforms current state-of-the-art and conclude by evaluating MoPAC for learning on a physical robotic hand performing valve rotation and finger gaiting--a task that requires grasping, manipulation, and then regrasping of an object.

翻译：收集的数据所引致的模型强化学习算法的重大进步受到模型-偏差的阻碍,通常会损害性能。与此同时,其固有的样本效率使得大多数机器人应用都具有实用性,限制了机器人及其环境在培训过程中可能受到的损害。在信息理论模型预测控制和深层强化学习进步的启发下,我们引入了模型预测动因-加速(MoPAC),这是一种混合模型/无模型方法,将模型预测推出与政策优化结合起来,以减少模型偏差。移动和空调部利用最佳轨迹指导政策学习,但通过不使用模型的方法进行探索,使算法能够学习更清晰的动态模型。这种组合保证了最佳技能学习到近似错误,并减少与环境的必要物理互动,使之适合真实机器人培训。我们提供了广泛的结果,展示了我们拟议的方法一般如何超越当前状态-艺术,并通过评价移动和手指演练的物理机器人手来学习需要掌握、操纵和重新定位一个物体的任务。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日