Recently, video frame interpolation using a combination of frame- and event-based cameras has surpassed traditional image-based methods both in terms of performance and memory efficiency. However, current methods still suffer from (i) brittle image-level fusion of complementary interpolation results, that fails in the presence of artifacts in the fused image, (ii) potentially temporally inconsistent and inefficient motion estimation procedures, that run for every inserted frame and (iii) low contrast regions that do not trigger events, and thus cause events-only motion estimation to generate artifacts. Moreover, previous methods were only tested on datasets consisting of planar and faraway scenes, which do not capture the full complexity of the real world. In this work, we address the above problems by introducing multi-scale feature-level fusion and computing one-shot non-linear inter-frame motion from events and images, which can be efficiently sampled for image warping. We also collect the first large-scale events and frames dataset consisting of more than 100 challenging scenes with depth variations, captured with a new experimental setup based on a beamsplitter. We show that our method improves the reconstruction quality by up to 0.2 dB in terms of PSNR and up to 15% in LPIPS score.
翻译:最近,使用基于框架和事件相机相结合的视频框架内插最近,在性能和记忆效率两方面都超越了传统的基于图像的方法,但是,目前的方法仍然受到以下因素的影响:(一) 相形相异的图象层层层融合,互为补充的内插结果在引信图像中未见成,(二) 对每个插入的框架都可能存在时间上不一致和低效率的动作估计程序,对每个插入的框架运行,以及(三) 低对比区域,不触发事件,从而导致只对事件进行运动估计,从而产生文物。此外,以前的方法仅对由平面和遥远的场景组成的数据集进行了测试,这些数据集无法完全捕捉到真实世界的复杂程度。在这项工作中,我们通过采用多层次的地貌级融合和从事件和图像中计算一截图谱的非线性间运动的方法来解决上述问题,可以有效地为图像扭曲取样。我们还收集了第一个大型事件和框架数据集,由100多个具有深度变异性的场景组成。我们用新的实验设置了一套新的实验设置,不能完全捕捉到真实世界的PPPB等级,我们的方法改进了15的评分数质量至PPB至B的升级。