具有选择性的 " 有限模型能力 " 下的宫式规划 (Selective Dyna-style Planning Under Limited Model Capacity)

In model-based reinforcement learning, planning with an imperfect model of the environment has the potential to harm learning progress. But even when a model is imperfect, it may still contain information that is useful for planning. In this paper, we investigate the idea of using an imperfect model selectively. The agent should plan in parts of the state space where the model would be helpful but refrain from using the model where it would be harmful. An effective selective planning mechanism requires estimating predictive uncertainty, which arises out of aleatoric uncertainty, parameter uncertainty, and model inadequacy, among other sources. Prior work has focused on parameter uncertainty for selective planning. In this work, we emphasize the importance of model inadequacy. We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.

翻译：在基于模型的强化学习中,环境模型不完善的规划有可能损害学习的进展。但是,即使模型不完善,它也可能包含对规划有用的信息。在本文中,我们调查了选择性使用不完善模型的想法。该代理人应当在模型有帮助但不会使用有害模型的州空间进行规划。有效的选择性规划机制要求估算预测不确定性,这种不确定性产生于疏松的不确定性、参数不确定性和模型不足,以及其他来源。先前的工作侧重于选择性规划的参数不确定性。在这项工作中,我们强调模型不足的重要性。我们表明,反复回归可以表明模型不足产生的预测不确定性,而模型不足是对为参数不确定性设计的方法所检测到的不确定性的补充,表明考虑参数不确定性和模型不足可能是有效选择性规划的最有希望的方向,而不是孤立地进行。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习Primer，122页pdf

专知会员服务

109+阅读 · 2020年10月5日