示范强化学习的准确长期动态 (Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning)

Accurately predicting the dynamics of robotic systems is crucial for model-based control and reinforcement learning. The most common way to estimate dynamics is by fitting a one-step ahead prediction model and using it to recursively propagate the predicted state distribution over long horizons. Unfortunately, this approach is known to compound even small prediction errors, making long-term predictions inaccurate. In this paper, we propose a new parametrization to supervised learning on state-action data to stably predict at longer horizons -- that we call a trajectory-based model. This trajectory-based model takes an initial state, a future time index, and control parameters as inputs, and directly predicts the state at the future time index. Experimental results in simulated and real-world robotic tasks show that trajectory-based models yield significantly more accurate long term predictions, improved sample efficiency, and the ability to predict task reward. With these improved prediction properties, we conclude with a demonstration of methods for using the trajectory-based model for control.

翻译：精确预测机器人系统的动态对于基于模型的控制和强化学习至关重要。估计动态的最常用方法是先安装一个预先一步的预测模型,然后用它来反复传播长期的预测状态分布。不幸的是,这一方法已知甚至会增加一些小的预测错误,使长期预测不准确。在本文中,我们提议一种新的平衡,以监督国家行动数据方面的学习,从而在更远的视野上进行预测,我们称之为基于轨迹的模型。这个基于轨迹的模式首先采用状态,未来的时间指数和控制参数作为投入,并直接预测未来时间指数的状况。模拟和现实世界机器人任务的实验结果显示,基于轨迹的模型产生更准确的长期预测,提高样本效率,以及预测任务奖励的能力。有了这些改进的预测特性,我们最后以使用基于轨迹的模型进行控制的方法为例。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日