Occlusion is one of the most significant challenges encountered by object detectors and trackers. While both object detection and tracking has received a lot of attention in the past, most existing methods in this domain do not target detecting or tracking objects when they are occluded. However, being able to detect or track an object of interest through occlusion has been a long standing challenge for different autonomous tasks. Traditional methods that employ visual object trackers with explicit occlusion modeling experience drift and make several fundamental assumptions about the data. We propose to address this with a `tracking-by-detection` approach that builds upon the success of region based video object detectors. Our video level object detector uses a novel recurrent computational unit at its core that enables long term propagation of object features even under occlusion. Finally, we compare our approach with existing state-of-the-art video object detectors and show that our approach achieves superior results on a dataset of furniture assembly videos collected from the internet, where small objects like screws, nuts, and bolts often get occluded from the camera viewpoint.
翻译:隔离是物体探测器和跟踪器遇到的最重大挑战之一。虽然物体探测和跟踪在过去都受到了很多关注,但这一领域现有的大多数方法并不针对被隐蔽的物体的探测或跟踪,然而,通过隐蔽能够探测或跟踪感兴趣的物体是不同自主任务的长期挑战。使用具有明显隐蔽模型的视觉物体追踪器的传统方法对数据进行漂移,并对数据作出若干基本假设。我们提议在区域视频物体探测器成功的基础上,采用“跟踪逐次探测”方法解决这一问题。我们的视频级物体探测器在其核心使用一个新的经常性计算单位,使长期传播物体特征,即使在隐蔽状态下也是如此。最后,我们将我们的方法与现有的最新视频物体探测器加以比较,并表明我们的方法在从互联网收集的家具组装视频数据集上取得了优异的结果,在这类视频中,螺丝、坚果和螺栓等小物体往往被从相机的角度隐蔽。