We consider predicting the user's head motion in 360-degree videos, with 2 modalities only: the past user's positions and the video content (not knowing other users' traces). We make two main contributions. First, we re-examine existing deep-learning approaches for this problem and identify hidden flaws from a thorough root-cause analysis. Second, from the results of this analysis, we design a new proposal establishing state-of-the-art performance. First, re-assessing the existing methods that use both modalities, we obtain the surprising result that they all perform worse than baselines using the user's trajectory only. A root-cause analysis of the metrics, datasets and neural architectures shows in particular that (i) the content can inform the prediction for horizons longer than 2 to 3 sec. (existing methods consider shorter horizons), and that (ii) to compete with the baselines, it is necessary to have a recurrent unit dedicated to process the positions, but this is not sufficient. Second, from a re-examination of the problem supported with the concept of Structural-RNN, we design a new deep neural architecture, named TRACK. TRACK achieves state-of-the-art performance on all considered datasets and prediction horizons, outperforming competitors by up to 20 percent on focus-type videos and horizons 2-5 seconds. The entire framework (codes and datasets) is online and received an ACM reproducibility badge.
翻译:我们考虑在360度视频中预测用户的头部运动,仅使用2种模式:过去的用户立场和视频内容(不知道其他用户的踪迹)。我们做出了两个主要贡献。首先,我们重新审查了目前对这一问题的深层学习方法,从彻底的根源分析中找出隐藏的缺陷。第二,根据这一分析的结果,我们设计了一个新的建议,以建立最新业绩。首先,重新评估了使用两种模式的现有方法,我们取得了令人惊讶的结果,即它们都比仅使用用户轨迹的基线表现得差。对指标、数据集和神经结构的根因分析特别表明:(一) 内容可以为2至3秒以上的地平线预测提供信息(现有方法考虑较短的地平线),以及(二) 为了与基线竞争,我们有必要有一个经常性的单位专门处理这些位置,但这还不够。第二,从对问题进行重新审视,并借助结构-NNN的轨迹概念,我们设计了一个新的深层神经结构结构结构、数据集和神经结构结构分析,我们设计了一个新的深层神经结构结构结构,名为TRACK和神经结构结构,通过20秒的轨迹图像定位,实现了20级的运行的轨道的轨道的轨道定位,所有运行的轨道上的轨道上的轨道上的轨道上的轨道上的轨道上的轨道上的轨道上的轨道上的轨道上的轨道和轨道上的轨道上的轨道上。