Over the years various methods have been proposed for the problem of object detection. Recently, we have witnessed great strides in this domain owing to the emergence of powerful deep neural networks. However, there are typically two main assumptions common among these approaches. First, the model is trained on a fixed training set and is evaluated on a pre-recorded test set. Second, the model is kept frozen after the training phase, so no further updates are performed after the training is finished. These two assumptions limit the applicability of these methods to real-world settings. In this paper, we propose Interactron, a method for adaptive object detection in an interactive setting, where the goal is to perform object detection in images observed by an embodied agent navigating in different environments. Our idea is to continue training during inference and adapt the model at test time without any explicit supervision via interacting with the environment. Our adaptive object detection model provides a 11.8 point improvement in AP (and 19.1 points in AP50) over DETR, a recent, high-performance object detector. Moreover, we show that our object detection model adapts to environments with completely different appearance characteristics, and its performance is on par with a model trained with full supervision for those environments.
翻译:多年来,为物体探测问题提出了各种方法。最近,由于强大的深神经网络的出现,我们目睹了这一领域的巨大进步。然而,这些方法中通常有两大共同假设。首先,模型在固定的训练组中接受培训,并在预先记录的测试组中进行评估。第二,模型在培训阶段后被冻结,因此在培训结束后不再进行进一步更新。这两个假设限制了这些方法对现实世界环境的适用性。在本文件中,我们提议了Interactron,这是在互动环境中对物体进行适应性探测的一种方法,目的是用在不同环境中由内装代理人观测到的图像进行物体探测。我们的想法是,在推断期间继续培训,并在测试时对模型进行调整,而没有通过与环境的相互作用进行任何明确的监督。我们的适应性物体探测模型为AP提供了11.8点的改进(和AP50中的19.1点),而DETR是最近的高性能物体探测器。此外,我们表明,我们的物体探测模型适应了外观特征完全不同的环境,其性能与经过全面监督的模型完全相同。