Succinct representation of complex signals using coordinate-based neural representations (CNRs) has seen great progress, and several recent efforts focus on extending them for handling videos. Here, the main challenge is how to (a) alleviate a compute-inefficiency in training CNRs to (b) achieve high-quality video encoding while (c) maintaining the parameter-efficiency. To meet all requirements (a), (b), and (c) simultaneously, we propose neural video representations with learnable positional features (NVP), a novel CNR by introducing "learnable positional features" that effectively amortize a video as latent codes. Specifically, we first present a CNR architecture based on designing 2D latent keyframes to learn the common video contents across each spatio-temporal axis, which dramatically improves all of those three requirements. Then, we propose to utilize existing powerful image and video codecs as a compute-/memory-efficient compression procedure of latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07$\rightarrow$34.57 (measured with the PSNR metric), even using $>$8 times fewer parameters. We also show intriguing properties of NVP, e.g., video inpainting, video frame interpolation, etc.
翻译:使用基于协调的神经表现器(CNRs)的复杂信号的缩略图呈现取得了巨大进展,最近一些努力侧重于将视频用于视频处理。在这里,主要的挑战是如何(a) 降低CNR培训计算效率的计算效率,以便(b) 实现高质量的视频编码,同时(c) 保持参数效率。同时,我们提出具有基于协调的神经神经表现器(NVP)的神经视频表述,这是一个全新的CNR,引入“可识别定位功能”,有效地将视频作为潜在代码进行调试。具体地说,我们首先提出CNR结构,其基础是设计2D潜在关键框架,以学习每个波段-时空轴的通用视频内容,从而(b) 实现高质量的视频编码;然后,我们提议利用现有的强势图像和视频编码,作为隐性代码的调制/模拟高效压缩程序。我们展示了NVPG基准的优越性;与前一款相比, NVPPPrral不仅能快速地在5分钟内进行2次的升级。