This paper addresses the problem of visual feature representation learning with an aim to improve the performance of end-to-end reinforcement learning (RL) models. Specifically, a novel architecture is proposed that uses a heterogeneous loss function, called CRC loss, to learn improved visual features which can then be used for policy learning in RL. The CRC-loss function is a combination of three individual loss functions, namely, contrastive, reconstruction and consistency loss. The feature representation is learned in parallel to the policy learning while sharing the weight updates through a Siamese Twin encoder model. This encoder model is augmented with a decoder network and a feature projection network to facilitate computation of the above loss components. Through empirical analysis involving latent feature visualization, an attempt is made to provide an insight into the role played by this loss function in learning new action-dependent features and how they are linked to the complexity of the problems being solved. The proposed architecture, called CRC-RL, is shown to outperform the existing state-of-the-art methods on the challenging Deep mind control suite environments by a significant margin thereby creating a new benchmark in this field.
翻译:本文论述视觉特征表现学习问题,目的是改进端到端强化学习(RL)模型的绩效。具体地说,建议采用一个新结构,使用一种不同的损失功能,称为CRC损失,学习更好的视觉特征,然后用于RL的政策学习。CRC损失功能是三种个人损失功能的结合,即对比性、重建和一致性损失。特征表现与政策学习同时学习,同时通过一个Siamese Tyle编码模型分享重量更新。这个编码模型通过一个解码网络和一个特征投影网络加以扩展,以便利上述损失组成部分的计算。通过涉及潜在特征视觉化的经验分析,试图深入了解这一损失功能在学习新的依赖行动特征以及这些特征如何与正在解决的复杂问题相联系方面所起的作用。拟议的结构称为CRC-RL,显示它超越了挑战性深智控制套环境的现有状态方法,从而在这一领域创造新的基准。