This study proposes a practical approach for compressing 360-degree equirectangular videos using pretrained neural video compression (NVC) models. Without requiring additional training or changes in the model architectures, the proposed method extends quantization parameter adaptation techniques from traditional video codecs to NVC, utilizing the spatially varying sampling density in equirectangular projections. We introduce latitude-based adaptive quality parameters through rate-distortion optimization for NVC. The proposed method utilizes vector bank interpolation for latent modulation, enabling flexible adaptation with arbitrary quality parameters and mitigating the limitations caused by rounding errors in the adaptive quantization parameters. Experimental results demonstrate that applying this method to the DCVC-RT framework yields BD-Rate savings of 5.2% in terms of the weighted spherical peak signal-to-noise ratio for JVET class S1 test sequences, with only a 0.3% increase in processing time.
翻译:本研究提出了一种利用预训练神经视频压缩模型压缩360度等距柱面投影视频的实用方法。该方法无需额外训练或改变模型架构,通过利用等距柱面投影中空间变化的采样密度,将传统视频编解码器中的量化参数自适应技术扩展至神经视频压缩。我们通过率失真优化为神经视频压缩引入了基于纬度的自适应质量参数。所提方法利用向量库插值进行潜在特征调制,从而能够灵活适应任意质量参数,并缓解了自适应量化参数中舍入误差带来的限制。实验结果表明,将该方法应用于DCVC-RT框架,在JVET S1类测试序列上,以加权球面峰值信噪比衡量可获得5.2%的BD-Rate节省,而处理时间仅增加0.3%。