Over the years various methods have been proposed for the problem of object detection. Recently, we have witnessed great strides in this domain owing to the emergence of powerful deep neural networks. However, there are typically two main assumptions common among these approaches. First, the model is trained on a fixed training set and is evaluated on a pre-recorded test set. Second, the model is kept frozen after the training phase, so no further updates are performed after the training is finished. These two assumptions limit the applicability of these methods to real-world settings. In this paper, we propose Interactron, a method for adaptive object detection in an interactive setting, where the goal is to perform object detection in images observed by an embodied agent navigating in different environments. Our idea is to continue training during inference and adapt the model at test time without any explicit supervision via interacting with the environment. Our adaptive object detection model provides a 7.2 point improvement in AP (and 12.7 points in AP50) over DETR, a recent, high-performance object detector. Moreover, we show that our object detection model adapts to environments with completely different appearance characteristics, and performs well in them. The code is available at: https://github.com/allenai/interactron .
翻译:多年来,为物体探测问题提出了各种方法。最近,我们目睹了由于强大的深心神经网络的出现而在这一领域取得的长足进步。然而,这些方法中通常有两种共同的主要假设。首先,模型在固定的训练组中接受培训,在预先记录的测试组中进行评估。第二,模型在培训阶段后被冻结,因此在培训结束后不再进行进一步更新。这两个假设限制了这些方法对现实世界环境的适用性。在本文件中,我们提议了Interactron,这是在互动环境中进行适应性物体探测的一种方法,目的是对在不同环境中由内装代理人观测到的图像进行物体探测。我们的想法是,在推断期间继续培训,并在测试时对模型进行调整,而没有通过与环境的相互作用进行任何明确的监督。我们的适应性物体探测模型在AP(和AP50中的12.7点)比DETR(一个最新的高性能物体探测器)改进了7.2点。此外,我们指出,我们的物体探测模型适应环境时,其外观特征完全不同,并在这些环境中很好地运行。我们的想法是: http://gith.arromarrenna.