360-degree panoramic videos have gained considerable attention in recent years due to the rapid development of head-mounted displays (HMDs) and panoramic cameras. One major problem in streaming panoramic videos is that panoramic videos are much larger in size compared to traditional ones. Moreover, the user devices are often in a wireless environment, with limited battery, computation power, and bandwidth. To reduce resource consumption, researchers have proposed ways to predict the users' viewports so that only part of the entire video needs to be transmitted from the server. However, the robustness of such prediction approaches has been overlooked in the literature: it is usually assumed that only a few models, pre-trained on past users' experiences, are applied for prediction to all users. We observe that those pre-trained models can perform poorly for some users because they might have drastically different behaviors from the majority, and the pre-trained models cannot capture the features in unseen videos. In this work, we propose a novel meta learning based viewport prediction paradigm to alleviate the worst prediction performance and ensure the robustness of viewport prediction. This paradigm uses two machine learning models, where the first model predicts the viewing direction, and the second model predicts the minimum video prefetch size that can include the actual viewport. We first train two meta models so that they are sensitive to new training data, and then quickly adapt them to users while they are watching the videos. Evaluation results reveal that the meta models can adapt quickly to each user, and can significantly increase the prediction accuracy, especially for the worst-performing predictions.
翻译:360度全景视频近年来由于头部显示器和全景相机的迅速发展而引起相当的关注。 流出全景视频的一个主要问题是全景视频的规模比传统视频大得多。 此外,用户设备往往处于无线环境中,电池、计算能力和带宽有限。 为了减少资源消耗,研究人员提出了预测用户的视频门户的方法,这样整个视频中只有一部分需要从服务器上传输。 然而,文献中忽略了这类预测方法的稳健性:通常假设只有少数一些根据以往用户经验预先训练过的准确性模型用于所有用户的预测。 我们观察到,这些经过预先训练的模型对于一些用户来说效果很差,因为它们可能与大多数用户有着截然不同的行为,而经过预先训练的模型无法捕捉到隐蔽视频视频的特征。 在这项工作中,我们提出了一个新的基于元学习的最差的视觉预测模型,以降低最差的预测性能,并确保视觉预测的稳健性。 这个模型使用两种特别精准的模型,特别是根据过去用户的经验, 用于对所有用户的经验进行预测。