部分可观测强化学习的条块序列模型学习 (Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning)

This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems. Rather than compressing sequential information at every timestep as in conventional recurrent neural network-based methods, the proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on self-attention, making the model capable of detailed sequential learning in partial observable settings. The proposed model builds an additional learning network to efficiently implement gradient estimation by using self-normalized importance sampling, which does not require the complex blockwise input data reconstruction in the model learning. Numerical results show that the proposed method significantly outperforms previous methods in various partially observable environments.

翻译：本文件提出一个新的顺序学习模式架构,以解决部分可见的Markov 决策问题。拟议架构不是在常规常规经常性神经网络方法等每个时间步骤压缩顺序信息,而是在每一个数据区块中生成一个潜伏变量,其长度为多个时段,并将最相关的信息传递到下一个区块,以便优化政策。拟议的块状相继模式基于自省实施,使该模式能够在部分可观测环境中进行详细的顺序学习。拟议模式通过使用自我标准化重要性抽样,建立一个额外的学习网络,以高效实施梯度估算,这不需要在模型学习中进行复杂的块状输入数据重建。数字结果显示,拟议方法大大优于不同部分可观测环境中的以往方法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日