To deliver ultra-high resolution 360-degree video (such as 8K, 12K, or even higher) across the internet, viewport-dependent streaming becomes necessary to save bandwidth. During viewport switches, clients and servers will instantly exchange coordination info and contents for the given viewports. However, those viewport switches pose a serious challenge for video encoding because the temporal dependency between contents within changing viewports is unpredictable. In existing practices, it is commonly noted that GOP (Group of Pictures) size in a bitstream intrinsically prohibits the reduction of the viewport switch latency, such as Motion-to-photon (MTP) latency, or motion-to-high-quality (MTHQ) latency. In this paper, we presented a Scalable Video Coding (SVC) based bitstream schema, which can structurally remove the impacts of GOP in viewport-dependent streaming and provide instant viewport switches within one-frame time (the best possible). In addition, combined with tiling, this new coding schema allows an efficient packing of the non-adjacent regions within a viewport of 360-degree video. Our experiments also show that the overall encoding with this SVC-based approach is faster than with multi-stream approaches. Compared with current 360-degree video streaming solutions based on MPEG-I OMAF, our approach is superior in terms of viewport switch latency, simplicity of viewport packing, and encoding performance.
翻译:为了在互联网上传输超高分辨率360度视频(例如8K、12K甚至更高),需要进行经度依赖的流媒体传输以节省带宽。在视口切换期间,客户端和服务器会立即交换协调信息和给定视口的内容。然而,这些视口切换对视频编码造成了严重的挑战,因为在更改视口内的内容之间的时间依赖性是不可预测的。在现有的实践中,普遍认为比特流中的GOP(图片组)大小本质上会限制视口切换延迟的缩短,例如运动到光子(MTP)延迟或运动到高质量(MTHQ)延迟。在本文中,我们提出了一种基于可扩展视频编码(SVC)的比特流模式,该模式可以结构上消除GOP在视口依赖流媒体中的影响,并在一帧时间内提供即时视口切换(最佳可行的方案)。此外,与平铺结合使用,这种新的编码方案可以有效地打包360度视频视口内的非相邻区域。我们的实验还表明,与基于多流的方法相比,使用这种基于SVC的方法的总体编码速度更快。与基于MPEG-I OMAF的当前360度视频流媒体解决方案相比,我们的方法在视口切换延迟、视口包装的简单性和编码性能方面都更加优越。