The demand of high-resolution video contents has grown over the years. However, the delivery of high-resolution video is constrained by either computational resources required for rendering or network bandwidth for remote transmission. To remedy this limitation, we leverage the eye trackers found alongside existing augmented and virtual reality headsets. We propose the application of video super-resolution (VSR) technique to fuse low-resolution context with regional high-resolution context for resource-constrained consumption of high-resolution content without perceivable drop in quality. Eye trackers provide us the gaze direction of a user, aiding us in the extraction of the regional high-resolution context. As only pixels that falls within the gaze region can be resolved by the human eye, a large amount of the delivered content is redundant as we can't perceive the difference in quality of the region beyond the observed region. To generate a visually pleasing frame from the fusion of high-resolution region and low-resolution region, we study the capability of a deep neural network of transferring the context of the observed region to other regions (low-resolution) of the current and future frames. We label this task a Foveated Video Super-Resolution (FVSR), as we need to super-resolve the low-resolution regions of current and future frames through the fusion of pixels from the gaze region. We propose Cross-Resolution Flow Propagation (CRFP) for FVSR. We train and evaluate CRFP on REDS dataset on the task of 8x FVSR, i.e. a combination of 8x VSR and the fusion of foveated region. Departing from the conventional evaluation of per frame quality using SSIM or PSNR, we propose the evaluation of past foveated region, measuring the capability of a model to leverage the noise present in eye trackers during FVSR. Code is made available at https://github.com/eugenelet/CRFP.
翻译:高清晰度视频内容的需求逐年增加,然而,高清晰度视频内容的需求逐年增加,但高清晰度视频的提供受到以下因素的限制:提供或网络带宽所需的计算资源,以进行远程传输。为了纠正这一限制,我们利用现有增强和虚拟现实头戴的视觉跟踪器。我们提议采用视频超分辨率(VSR)技术,将低清晰度背景与区域高分辨率背景结合起来,以便以资源限制的方式消费高清晰度内容,而不会出现质量下降。目视跟踪器为我们提供了一个用户的视线方向,帮助我们提取区域高清晰度视频背景。由于只有显示区域内的像素才能被人类眼睛所解决,因此大量交付的内容是多余的,因为我们无法察觉到所观测区域以外的区域的质量差异。为了从高清晰度区域或低分辨率区域中产生一个视觉模型,我们研究将观测区域的背景(低清晰度)转移到当前和未来框架。我们将当前和今后框架的软化视频-S-SR(FVR)的当前快速度数据流流数据从当前版本区域向未来版本。