A novel 4K video frame interpolator based on bilateral transformer (BiFormer) is proposed in this paper, which performs three steps: global motion estimation, local motion refinement, and frame synthesis. First, in global motion estimation, we predict symmetric bilateral motion fields at a coarse scale. To this end, we propose BiFormer, the first transformer-based bilateral motion estimator. Second, we refine the global motion fields efficiently using blockwise bilateral cost volumes (BBCVs). Third, we warp the input frames using the refined motion fields and blend them to synthesize an intermediate frame. Extensive experiments demonstrate that the proposed BiFormer algorithm achieves excellent interpolation performance on 4K datasets. The source codes are available at https://github.com/JunHeum/BiFormer.
翻译:本文提出了一种基于双侧 Transformer 的 4K 视频帧插值器 BiFormer,它执行三个步骤:全局运动估计、局部运动细化和帧合成。首先,在全局运动估计中,我们在粗略尺度上预测对称的双侧运动场。为此,我们提出了 BiFormer,第一个基于 Transformer 的双侧运动估计器。其次,我们使用块状双侧代价体量(BBCVs)高效地细化全局运动场。第三,我们使用细化的运动场对输入帧进行变形,然后混合它们以合成中间帧。广泛的实验表明,提出的 BiFormer 算法在四个数据集上都获得了出色的插值性能。源代码可在 https://github.com/JunHeum/BiFormer 获得。