Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks.
翻译:了解人类物体相互作用在第一人称视野(FPV)中至关重要。跟踪被照相机操纵的物体的跟踪算法可以提供有效模拟这种相互作用的有用线索。计算机视觉文献中的视觉跟踪解决方案在过去几年中大大改善了许多目标物体和跟踪假想的性能。然而,尽管此前曾几次尝试利用FPV应用中的跟踪器,但对该领域最新跟踪器的性能进行系统分析,但目前仍然缺乏这种系统分析。在本文中,我们通过在FPV中首次提出物体跟踪系统研究来填补这一空白。我们的研究广泛分析了最近的视觉跟踪器和基线FPV跟踪器在不同方面的性能,并考虑了新的性能计量。这是通过TREK-150实现的,这是一套由150个密集的附加说明视频序列组成的新的基准数据集。我们的结果显示,FPV的物体跟踪具有挑战性,这表明,应当更多地研究这个问题,以便跟踪FPV的任务能够受益。