The performance of video frame interpolation is inherently correlated with the ability to handle motion in the input scene. Even though previous works recognize the utility of asynchronous event information for this task, they ignore the fact that motion may or may not result in blur in the input video to be interpolated, depending on the length of the exposure time of the frames and the speed of the motion, and assume either that the input video is sharp, restricting themselves to frame interpolation, or that it is blurry, including an explicit, separate deblurring stage before interpolation in their pipeline. We instead propose a general method for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. Our model consists in a bidirectional recurrent network that naturally incorporates the temporal dimension of interpolation and fuses information from the input frames and the events adaptively based on their temporal proximity. In addition, we introduce a novel real-world high-resolution dataset with events and color videos named HighREV, which provides a challenging evaluation setting for the examined task. Extensive experiments on the standard GoPro benchmark and on our dataset show that our network consistently outperforms previous state-of-the-art methods on frame interpolation, single image deblurring and the joint task of interpolation and deblurring. Our code and dataset will be made publicly available.
翻译:视频框架内插的性能与处理输入场景中运动的能力有着内在的联系。尽管先前的作品承认非同步事件信息对这项任务的效用,但它们忽略了这样一个事实,即运动可能会或者不会导致要插入的输入视频模糊不清,取决于框架的暴露时间长度和运动的速度,并假定输入视频是锐利的,限制自己来设置内插,或者它模糊不清,包括在其管道内插之前有一个清晰的、分开的解密阶段。我们相反地提出了基于事件的框架内插的一般方法,进行分流的adhoc,从而在尖锐和模糊的输入视频上运作。我们的模型由双向重复的网络组成,它自然结合了内插框架的时间层面和从输入框架和根据时间的适应性事件的信息。此外,我们引入了一个新的真实世界高分辨率数据集,包括事件和名为HiREV的彩色视频,它为所审查的任务提供了一个具有挑战性的评估环境。关于我们之前的标准化网络内部数据框架的大规模实验将显示我们以往的单一数据框架。