Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
翻译:以协调为基础或隐含的神经外观,也称为基于协调的神经外观,显示出代表、生成和操纵各种形式的信号的非凡能力。但是,对于视频外观,映射像素坐标到RGB颜色的像素坐标显示压缩性能相对较低,趋同速度和推论速度缓慢。根据框架外观的视频外观,绘制与其整个框架的时间坐标,最近作为一种代表视频、提高压缩率和编码速度的替代方法出现。虽然很有希望,但它仍然未能达到最新视频压缩算法的功能。在这项工作中,我们提议了FFNERV,这是将流动信息纳入框架外观表达的新方法,以利用标准视频代码所启发的视频框中的时间冗余。此外,我们引入了完全进化的结构,该结构以一维时间网格为依托,改善了空间特征的连续性。实验结果表明,FF NERV在使用框架外观演示或神经外观域的方法中生成了最佳的图像压缩和内插图。为了进一步降低模型的大小,我们设计了一种更为紧凑的变压式的图像结构结构,包括使用集团。