OPEE: 一个无任务学习的不限名额物理环境 (OPEn: An Open-ended Physics Environment for Learning Without a Task)

Humans have mental models that allow them to plan, experiment, and reason in the physical world. How should an intelligent agent go about learning such models? In this paper, we will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks. To this end, we build a benchmark Open-ended Physics ENvironment (OPEn) and also design several tasks to test learning representations in this environment explicitly. This setting reflects the conditions in which real agents (i.e. rolling robots) find themselves, where they may be placed in a new kind of environment and must adapt without any teacher to tell them how this environment works. This setting is challenging because it requires solving an exploration problem in addition to a model building and representation learning problem. We test several existing RL-based exploration methods on this benchmark and find that an agent using unsupervised contrastive learning for representation learning, and impact-driven learning for exploration, achieved the best results. However, all models still fall short in sample efficiency when transferring to the downstream tasks. We expect that OPEn will encourage the development of novel rolling robot agents that can build reusable mental models of the world that facilitate many tasks.

翻译：人类有能够规划、实验和理性物理世界的心理模型。智能剂应该如何去学习这些模型? 在本文中, 我们将研究世界模型在开放的物理环境中学习, 而不需任何具体任务, 是否可以再用于下游物理推理任务。为此, 我们建立一个基准的开放式物理环境( OPEn), 并设计几项任务来测试在这种环境中的学习表现。这个设置反映了真实剂( 即滚动机器人) 发现自己所处的环境, 在那里他们可能被安置在一种新的环境中, 并且必须适应任何教师来告诉他们这个环境是如何运作的。这个设置具有挑战性, 因为它需要解决一个探索问题, 除了一个建模和代表学习问题之外, 还要解决一个建模和代表学习问题。我们在这个基准上测试了几个现有的基于RL的探索方法, 并发现一个使用不受监督的对比学习来进行代言学习的代理, 和以影响驱动的探索学习, 取得了最佳结果。但是, 在向下游任务转移时, 所有模型的效率仍然不足。我们期待 OPE会鼓励开发新的滚动机器人代理人, 能够重建世界。