The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used ``off-the-shelf'' or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated.
翻译:在第一人称视野(FPV)中,对人体和物体相互作用的理解是根本的。跟踪摄影机操纵的物体的视觉跟踪算法可以提供有用的信息,有效地模拟这种相互作用。在过去几年中,计算机视觉界大大改进了各种目标物体和情景的跟踪算法的性能。尽管以前曾几次试图利用FPV域的跟踪器,但对最新跟踪器的性能仍缺乏方法分析。这一研究差距提出了一个问题:目前的解决方案是否可以使用“现成的”或更多的特定域调查。本文旨在为这些问题提供答案。我们在FPV中首次系统地调查单一物体跟踪的性能。我们的研究广泛分析了42种算法的性能,包括通用物体跟踪器和基线的FPV特定跟踪器。分析的重点是FPV设置的不同方面,引入新的性能措施,以及与FPV具体任务有关的性能。通过介绍TREK-150个特定目标的新型基准数据集,包括150个特定目标跟踪法的动态。我们用新的视觉跟踪方法来显示新的F跟踪结果。我们在F轨迹上展示新的结果。