Compression and reconstruction of visual data have been widely studied in the computer vision community, even before the popularization of deep learning. More recently, some have used deep learning to improve or refine existing pipelines, while others have proposed end-to-end approaches, including autoencoders and implicit neural representations, such as SIREN and NeRV. In this work, we propose Neural Visual Representation with Content-adaptive Embedding (CNeRV), which combines the generalizability of autoencoders with the simplicity and compactness of implicit representation. We introduce a novel content-adaptive embedding that is unified, concise, and internally (within-video) generalizable, that compliments a powerful decoder with a single-layer encoder. We match the performance of NeRV, a state-of-the-art implicit neural representation, on the reconstruction task for frames seen during training while far surpassing for frames that are skipped during training (unseen images). To achieve similar reconstruction quality on unseen images, NeRV needs 120x more time to overfit per-frame due to its lack of internal generalization. With the same latent code length and similar model size, CNeRV outperforms autoencoders on reconstruction of both seen and unseen images. We also show promising results for visual data compression. More details can be found in the project pagehttps://haochen-rye.github.io/CNeRV/
翻译:甚至在普及深层学习之前,计算机视觉界就对视觉数据的压缩和重建进行了广泛的研究,甚至在普及深层学习之前就已经对视觉数据的压缩和重建进行了广泛的研究;最近,一些人利用深层的学习来改进或完善现有的管道,另一些人则提议了端到端的方法,包括自动解码器和隐含神经显示器,如SIREN和NERV。 在这项工作中,我们提议以内容适应性嵌入式(CNeRV)来进行神经视觉显示,将自动解析器的一般可及隐含代表器(CNeRV)的简洁和紧凑性结合起来;我们引入了一个新的内容适应性新颖的嵌入(在视频内),以统一、简洁和内部(在视频内)可以做到,向一个强大的解码器表示一个强大的解码器,与一个单一层的编码和隐含神经系统显示的神经系统的表现相匹配,在培训期间看到的框架的重建任务上远远超过在培训期间被跳过的框架(不见的图像)。要达到相似的重建质量,为了实现对视觉图像的修复,NRCV需要120多的时间,因为其内部一般图像的重建结果。