Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.
翻译:360个视频和光场图像等现代媒体数据通常以比观察者的视觉显示高得多的维度捕获。要在带宽限制的网络上高效浏览高维媒体,可以考虑导航流模式:客户通过将导航路径指定给服务器来浏览大型媒体空间,而服务器则将相应的预编码媒体数据单位(MDU)逐次传输给客户。 将一个MDU( I- MDU)进行内部解析,将产生一个大比特多,但I- MDU可以随机访问,而使用另一个MDU(P- MDU)作为预测器进行内部解密。从压缩角度看,技术挑战在于:如何通过将预编码前的媒体数据单位(MDU)逐次传输给客户端。为了解决这个问题,我们提出一个里程碑,从最高级的ODU(P-MDU)结构中选择一个关键MDU(M-DU),然后在高层次的媒体中将一个驱动性数据流数据流数据输出到一个MDU(MDU)内部预测,然后将一个MDU(我们要求的MDU)路路路路路段进行内部预测。