Implicit neural representations for videos (NeRV) have shown strong potential for video compression. However, applying NeRV to high-resolution 360-degree videos causes high memory usage and slow decoding, making real-time applications impractical. We propose NeRV360, an end-to-end framework that decodes only the user-selected viewport instead of reconstructing the entire panoramic frame. Unlike conventional pipelines, NeRV360 integrates viewport extraction into decoding and introduces a spatial-temporal affine transform module for conditional decoding based on viewpoint and time. Experiments on 6K-resolution videos show that NeRV360 achieves a 7-fold reduction in memory consumption and a 2.5-fold increase in decoding speed compared to HNeRV, a representative prior work, while delivering better image quality in terms of objective metrics.
翻译:视频隐式神经表示(NeRV)在视频压缩领域展现出巨大潜力。然而,将NeRV应用于高分辨率360度视频时,存在内存占用高、解码速度慢的问题,难以实现实时应用。本文提出NeRV360——一种端到端框架,该框架仅解码用户选定的视口而非重建整个全景帧。与传统流程不同,NeRV360将视口提取过程融入解码阶段,并引入时空仿射变换模块,实现基于视点与时间的条件解码。在6K分辨率视频上的实验表明:相较于代表性先前工作HNeRV,NeRV360在保持更优客观指标图像质量的同时,实现了内存消耗降低7倍、解码速度提升2.5倍的性能。