未受监督的阿塔里州代表制学习 (Unsupervised State Representation Learning in Atari)

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks. Learning such representations without supervision from rewards is a challenging open problem. We introduce a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations. We also introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state variables. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods. The code associated with this work is available at https://github.com/mila-iqia/atari-representation-learning

翻译：国家代表制学习,或捕捉环境潜在遗传因素的能力,对于培养能够履行广泛任务的各种智能剂至关重要。学习这种表现方式而不受奖赏的监督是一个挑战性的开放问题。我们引入了一种方法,通过在观测的神经编码器的空间和时间特点中最大限度地相互了解信息来学习国家代表形式。我们还引入了基于Atari 2600游戏的新基准,据此我们根据它们如何很好地捕捉到地面真相状态变量来评价表现方式。我们认为,评价代表性学习模式的新框架对于未来的代表性学习研究至关重要。最后,我们将我们的技术与其他最先进的基因化和对比性代表学习方法进行比较。与这项工作相关的代码可在https://github.com/mila-iqia/alari-present-learning查阅 https://github.com/meila-iqia/atial-present-lementation-lemessmentation。

相关内容

表示学习

关注 185

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。