可视化 MuZero 模型 (Visualizing MuZero Models)

MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value equivalent models have shown strong empirical success, there is no research yet that visualizes and investigates what types of representations these models actually learn. Therefore, in this paper we visualize the latent representation of MuZero agents. We find that action trajectories may diverge between observation embeddings and internal state transition dynamics, which could lead to instability during planning. Based on this insight, we propose two regularization techniques to stabilize MuZero's performance. Additionally, we provide an open-source implementation of MuZero along with an interactive visualizer of learned representations, which may aid further investigation of value equivalent algorithms.

翻译：Muzero是一种基于模型的强化学习算法,它使用一种等值动态模型,在Chess、Shogi和Go游戏中实现了最先进的性能。与预测下一个完整状态的标准前方动态模型相比,价值等值模型经过培训,可以预测未来价值,从而在陈述中强调相关的价值信息。虽然等值模型已经显示出巨大的经验成功,但还没有研究能够想象和调查这些模型实际学习的表现形式类型。因此,我们在本文件中设想了Muzero代理商的潜在代表性。我们发现,行动轨迹可能会在观测嵌入和内部状态过渡动态之间出现差异,这可能导致规划期间的不稳定。我们根据这一观察,建议采用两种正规化技术来稳定Muzero的绩效。此外,我们提供一种开放源的 Muzero实施方法,同时提供一种互动的直观演示工具,这可能有助于进一步调查等值算法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【MIT干货书】机器学习算法视角，126页pdf

专知会员服务

78+阅读 · 2021年1月25日

现代机器学习技术导论，596页pdf

专知会员服务

167+阅读 · 2020年7月27日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日