ASOD60K: 用于全景视频的音频引导的显光物体探测数据集 (ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos)

Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-/object-level saliency detection tasks, we focus on audio-induced salient object detection (SOD), where the salient objects are labeled with the guidance of audio-induced eye movements. To support this task, we collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy, thus distinguishing itself with richness, diversity and quality. Specifically, each sequence is marked with both its super-/sub-class, with objects of each sub-class being further annotated with human eye fixations, bounding boxes, object-/instance-level masks, and associated attributes (e.g., geometrical distortion). These coarse-to-fine annotations enable detailed analysis for PV-SOD modelling, e.g., determining the major challenges for existing SOD models, and predicting scanpaths to study the long-term eye fixation behaviors of humans. We systematically benchmark 11 representative approaches on ASOD60K and derive several interesting findings. We hope this study could serve as a good starting point for advancing SOD research towards panoramic videos. The dataset and benchmark will be made publicly available at https://github.com/PanoAsh/ASOD60K.

翻译：探索人类对动态全景场景的关注对于许多基本应用是有用的,包括零售、AR动力招聘和视觉语言导航中的强化现实(AR),包括零售、AR动力招聘和视觉语言导航中的强化现实(AR)。考虑到这一目标,我们提议PV-SOD,这是一项新任务,目的是从全景视频中分割突出的物体。与现有的固定-/目标级显著探测任务相比,我们侧重于由声频60级显著物体标记为声频导导眼运动指南的音频导60突出物体探测(SOD)。为了支持这项任务,我们收集了第一个名为ASOD60K的大型数据集,其中包括4K分辨率视频框架,带有六级等级的附加说明,从而区别了自身,与丰富性、多样性和质量不同。具体地说,每个序列都有超/子级的标记,而每个子级的物体则带有人类眼睛固定、捆绑框、对象/智能级面具和相关属性(例如,地球测量扭曲)。这些剖析-直径图像图说明从六级视频框开始进行详细分析,从而确定SOD长期数据模型。