360{\deg} videos in recent years have experienced booming development. Compared to traditional videos, 360{\deg} videos are featured with uncertain user behaviors, bringing opportunities as well as challenges. Datasets are necessary for researchers and developers to explore new ideas and conduct reproducible analyses for fair comparisons among different solutions. However, existing related datasets mostly focused on users' field of view (FoV), ignoring the more important eye gaze information, not to mention the integrated extraction and analysis of both FoV and eye gaze. Besides, users' behavior patterns are highly related to videos, yet most existing datasets only contained videos with subjective and qualitative classification from video genres, which lack quantitative analysis and fail to characterize the intrinsic properties of a video scene. To this end, we first propose a quantitative taxonomy for 360{\deg} videos that contains three objective technical metrics. Based on this taxonomy, we collect a dataset containing users' head and gaze behaviors simultaneously, which outperforms existing datasets with rich dimensions, large scale, strong diversity, and high frequency. Then we conduct a pilot study on user's behaviors and get some interesting findings such as user's head direction will follow his/her gaze direction with the most possible time interval. A case of application in tile-based 360{\deg} video streaming based on our dataset is later conducted, demonstrating a great performance improvement of existing works by leveraging our provided gaze information. Our dataset is available at https://cuhksz-inml.github.io/head_gaze_dataset/
翻译:近些年来的视频经历了蓬勃的发展。 与传统视频相比, 360\deg} 视频与传统视频相比, 360\ deg} 视频与视频高度相关, 带来了不确定的用户行为, 带来了机遇和挑战。 数据集对于研究人员和开发者来说是必要的, 以探索新的想法和进行可复制的分析, 以便公平比较不同的解决方案。 然而, 现有的相关数据集主要侧重于用户的视野领域( FoV), 忽略了更重要的目视信息, 更不用说对 Fov 和 眼睛眼视的综合提取和分析。 此外, 用户的行为模式与视频高度相关, 但大多数现有数据集仅包含视频genre 中主观和定性分类的视频。 缺乏定量分析, 且无法描述视频场景的内在属性。 我们首先提出360\deg} 视频的定量分类, 它包含三个客观的技术指标。 基于此分类, 我们收集的数据集包含用户的头部和视觉行为, 超越了现有数据集的丰富维度, 规模, 强的多样化和高频。 然后, 我们将进行一个实验性数据流中的数据分析, 将展示我们用户的图像的线索, 将展示。