Domain adaptation is a common problem in robotics, with applications such as transferring policies from simulation to real world and lifelong learning. Performing such adaptation, however, requires informative data about the environment to be available during the adaptation. In this paper, we present domain curiosity -- a method of training exploratory policies that are explicitly optimized to provide data that allows a model to learn about the unknown aspects of the environment. In contrast to most curiosity methods, our approach explicitly rewards learning, which makes it robust to environment noise without sacrificing its ability to learn. We evaluate the proposed method by comparing how much a model can learn about environment dynamics given data collected by the proposed approach, compared to standard curious and random policies. The evaluation is performed using a toy environment, two simulated robot setups, and on a real-world haptic exploration task. The results show that the proposed method allows data-efficient and accurate estimation of dynamics.
翻译:域适应是机器人的一个常见问题,其应用方法,例如将政策从模拟转移到现实世界和终身学习。但是,进行这种调整需要在适应期间提供关于环境的信息数据。在本文中,我们介绍了域探索政策 -- -- 这是培训探索政策的一种方法,明确优化以提供数据,使模型能够了解环境的未知方面。与大多数好奇方法不同,我们的方法明确奖励学习,使学习在不牺牲其学习能力的情况下对环境噪音具有活力。我们通过比较一个模型能够从拟议方法收集的数据中学习多少关于环境动态的信息来评估拟议方法,与标准的好奇和随机政策相比,我们比较了该方法。评价是使用一个玩具环境、两个模拟机器人设置和现实世界的随机探索任务进行的。结果显示,拟议的方法允许对动态进行数据效率和准确的估计。