Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-Time Video Super-Resolution (STVSR) are developed to incorporate temporal interpolation and spatial super-resolution in a unified framework. However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. The learned implicit neural representation can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales and significantly outperforms prior works on continuous and out-of-training-distribution scales. Our project page is at http://zeyuan-chen.com/VideoINR/ .
翻译:视频通常将流流和连续视觉数据记录为离散连续框架。由于高忠诚视频的存储成本昂贵,大部分存储成本相对较低,分辨率和框架率相对较低。空间时超分辨率(STVSR)的近期作品是在统一框架内开发的,目的是将时间内插和空间超分辨率纳入一个统一的框架中。然而,其中多数仅支持固定的取样尺度,从而限制其灵活性和应用。在这项工作中,我们提议不遵循离散演示,而是提议视频隐性神经代表(VideoINR),并展示其应用STVSR。所学到的隐性神经代表可以解码到任意空间分辨率和框架率的视频。我们显示,视频INR在通用升降标尺度上以最先进的STVSR方法取得竞争性的表演,大大超越了先前关于连续和外部培训分配尺度的工作。我们的项目网页是http://zeyuan-chen.com/VideoINR/。