Object detectors often experience a drop in performance when new environmental conditions are insufficiently represented in the training data. This paper studies how to automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., in an utterly self-supervised fashion. In our setting, an agent initially learns to explore the environment using a pre-trained off-the-shelf detector to locate objects and associate pseudo-labels. By assuming that pseudo-labels for the same object must be consistent across different views, we learn an exploration policy mining hard samples and we devise a novel mechanism for producing refined predictions from the consensus among observations. Our approach outperforms the current state-of-the-art, and it closes the performance gap against a fully supervised setting without relying on ground-truth annotations. We also compare various exploration policies for the agent to gather more informative observations. Code and dataset will be made available upon paper acceptance
翻译:当新的环境条件在培训数据中没有得到充分体现时,物体探测器的性能往往会下降。本文研究如何在不依赖人类干预的情况下,即完全以完全自我监督的方式,自动微调在新环境中探索和获取图像时,在不依赖人类干预的情况下,即在不依赖人类干预的情况下,自动微调原有的物体探测器。在我们的环境下,一个代理人最初学会使用预先训练的现成探测器来探索环境,以查找物体和相关的假标签。假设同一物体的假标签必须在不同观点之间保持一致,我们学习勘探政策,挖掘硬样品,并设计出一种新机制,从观测的共识中产生精细的预测。我们的方法比目前的状况要好,它缩小了业绩差距,在不依赖地面图解的情况下,完全受监督的环境。我们还比较了各种勘探政策,以便代理人收集更多信息观测结果。在接受纸质时,将提供代码和数据集。