Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. Despite a few previous attempts to exploit trackers in FPV applications, a systematic analysis of the performance of state-of-the-art trackers in this domain is still missing. On the other hand, the visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. To fill the gap, in this paper, we present TREK-100, the first benchmark dataset for visual object tracking in FPV. The dataset is composed of 100 video sequences densely annotated with 60K bounding boxes, 17 sequence attributes, 13 action verb attributes and 29 target object attributes. Along with the dataset, we present an extensive analysis of the performance of 30 among the best and most recent visual trackers. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem.
翻译:了解人类物体相互作用在第一人称视野(FPV)中至关重要。跟踪摄影机操纵的物体的跟踪算法可以为有效模拟这种相互作用提供有用的信息。尽管以前曾几次尝试利用FPV应用中的跟踪器,但仍缺乏对该领域最新跟踪器的绩效的系统分析。另一方面,计算机视觉文献中可用的视觉跟踪解决方案在过去几年中大大改善了其各种目标对象和跟踪情景的性能。为了填补这一空白,我们在本文件中介绍了用于FPV视觉物体跟踪的第一个基准数据集TREK-100。该数据集由100个视频序列组成,这些视频序列以60K捆绑框、17个序列属性、13个动作动动动特性和29个目标对象属性为密集注解。除了数据集外,我们还对最佳和最新视觉跟踪器中的30个性能进行了广泛分析。我们的结果显示,FPV的物体跟踪具有挑战性,这表明应当用更多的研究工作来解决这一问题。