改善强化学习中普遍化代表制在时间上的脱节 (Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning)

Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image, which can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).

翻译：强化学习( RL) 代理器通常无法对培训期间未观察到的状态空间的环境变化进行全面概括。这个问题对于基于图像的 RL 来说特别成问题, 仅改变一个变量, 如背景颜色, 就能改变图像中的许多像素, 从而导致代理器对图像的潜在表达方式发生急剧变化, 导致学习的政策失败。要学习更强有力的表达方式, 我们引入时尚分解( TED), 这是一种自我监督的辅助任务, 导致图像表达方式不相干, 利用 RL 观测的相继性质。我们从经验中发现, RL 算法将TED 作为一种辅助任务, 用得更快地适应环境变量的变化, 与最优化的政策性学习方法相比, 继续培训。由于TED 执行的是分解的演示结构, 我们还发现, TED 所培训的政策对与任务无关的变量( 如背景颜色) 的不可知的数值进行了更好的培训, 以及影响最佳政策( 如目标位置) 的不可知的变量的值。

相关内容

TED

关注 19

TED（指 Technology、Entertainment、Design 在英语中的缩写，即技术、娱乐、设计）是美国的一家私有非营利机构，该机构以它组织的 TED 大会著称。每年3月，TED大会在美国召集众多科学、设计、文学、音乐等领域的杰出人物，分享他们关於技术、社会、人的思考和探索。TED演讲的特点是毫无繁杂冗长的专业讲座，观点响亮，开门见山，种类繁多，看法新颖。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日