With the ubiquity of rolling shutter (RS) cameras, it is becoming increasingly attractive to recover the latent global shutter (GS) video from two consecutive RS frames, which also places a higher demand on realism. Existing solutions, using deep neural networks or optimization, achieve promising performance. However, these methods generate intermediate GS frames through image warping based on the RS model, which inevitably result in black holes and noticeable motion artifacts. In this paper, we alleviate these issues by proposing a context-aware GS video reconstruction architecture. It facilitates the advantages such as occlusion reasoning, motion compensation, and temporal abstraction. Specifically, we first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame accordingly. Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames at arbitrary times. Furthermore, we derive an approximated bilateral motion field model, which can serve as an alternative to provide a simple but effective GS frame initialization for related tasks. Experiments on synthetic and real data show that our approach achieves superior performance over state-of-the-art methods in terms of objective metrics and subjective visual quality. Code is available at \url{https://github.com/GitCVfb/CVR}.
翻译:随着滚动百叶窗摄像机(RS)的普遍存在,从连续两个RS框架恢复潜在的全球闭门电视(GS)视频越来越具有吸引力,这也对现实主义提出了更高的需求。现有的解决方案,利用深神经网络或优化,取得了有希望的性能。然而,这些方法通过基于RS模型的图像扭曲产生中间的GS框架,这不可避免地导致黑洞和可见的动作制品。在本文件中,我们通过提出一个符合背景的GS视频重建架构来缓解这些问题。它有利于诸如隐蔽推理、运动补偿和时间抽象等优势。具体地说,我们首先估计双边运动字段,这样两个RS框架的像素就会相应地被扭曲成一个共同的GS框架。然后,提出一个改进计划来指导GS框架的合成以及双边封闭面面罩,以便在任意的时间产生高纤维的GS视频框架。此外,我们提出了一个近似双边运动字段模型,可以作为为相关任务提供一个简单而有效的GS框架初始化。在合成和真实数据实验中显示,我们的方法在TR/C目标条件上达到了高端的图像质量。