根据想象力采取行动:何时相信模型强化学习中的想象轨迹 (Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning)

Model based reinforcement learning (MBRL) uses an imperfect model of the world to imagine trajectories of future states and plan the best actions that maximize a given reward. These trajectories are imperfect and MBRL attempts to overcome this by relying on model predictive control (MPC) to continuously re-imagine trajectories from scratch. Such re-generation of imagined trajectories carries the major computational cost and increasing complexity in tasks with longer receding horizon. We investigate how far in the future the imagined trajectories can be relied upon while still maintaining acceptable reward. After taking each action, information becomes available about its immediate effect and its impact on outcomes expected of future actions. Hereby, we propose four methods for deciding whether to trust and act upon imagined trajectories: i) looking at recent errors with respect to expectations, ii) comparing the confidence in an action imagined against its execution, iii) observing the deviation in projected future states iv) observing the deviation in projected future rewards. An experiment analyzing the effects of acting upon imagination shows that our methods reduce computation by at least 20\% and up to 80\%, depending on the environment, while retaining acceptable reward.

翻译：以模型为基础的强化学习(MBRL)使用一种不完善的世界模型模型,以想象未来国家的轨迹,并规划最佳行动,以最大限度地提高给定的奖励。这些轨迹是不完善的,而MBRL试图通过依靠模型预测控制(MPC)从零开始不断重新想象轨迹来克服这一点。这种想象的轨迹的再生成将带来重大的计算成本,并且随着时间的延长而使任务更加复杂。我们调查未来在保持可接受的奖励的同时,能够依赖想象的轨迹有多远。每次行动之后,都会获得关于其直接影响及其对未来行动预期结果的影响的信息。我们提出了决定是否信任和对想象轨迹采取行动的四种方法:一) 审视有关期望的最近错误,二) 比较所想象的行动与执行之间的信心,三) 观察预测未来状态的偏差,四) 观察预测未来奖励的偏差。分析想象力的效果的实验表明,我们的方法将减少至少20°C和80°C的计算,同时保留环境。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日