We introduce a general approach, called Invariance through Inference, for improving the test-time performance of an agent in deployment environments with unknown perceptual variations. Instead of producing invariant visual features through interpolation, invariance through inference turns adaptation at deployment-time into an unsupervised learning problem. This is achieved in practice by deploying a straightforward algorithm that tries to match the distribution of latent features to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of adaptation scenarios without access to deployment-time rewards, including changes in camera poses and lighting conditions. Results are presented on challenging distractor control suite, a robotics environment with image-based observations.
翻译:我们引入了一种一般方法,称为“因推论而误入歧途”来改进一个在部署环境中的代理物的测试-时间性能,其感知差异未知。与其通过内推产生变化性视觉特征,不如通过推论产生变化性视觉特征,使部署时的适应变成一个无人监督的学习问题。在实践中,这是通过使用一种直截了当的算法来实现的,该算法试图将潜在特征的分布与代理物的先前经验相匹配,而不必依靠对称数据。虽然我们简单,但我们表明这一想法导致在无法获得部署时间奖励的情况下,各种适应情景的惊人改进,包括相机的外形和照明条件的改变。 其结果是对分散控制套件提出了挑战,这是一个带有图像观测的机器人环境。