We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video frame interpolation. It is based on two essential designs. First, we build bidirectional correlation volumes for all pairs of pixels, and use the predicted bilateral flows to retrieve correlations for updating both flows and the interpolated content feature. Second, we derive multiple groups of fine-grained flow fields from one pair of updated coarse flows for performing backward warping on the input frames separately. Combining these two designs enables us to generate promising task-oriented flows and reduce the difficulties in modeling large motions and handling occluded areas during frame interpolation. These qualities promote our model to achieve state-of-the-art performance on various benchmarks with high efficiency. Moreover, our convolution-based model competes favorably compared to Transformer-based models in terms of accuracy and efficiency. Our code is available at https://github.com/MCG-NKU/AMT.
翻译:我们提出了多场域全对所有流变换(AMT),一种新的视频帧插值网络架构。其基于两个关键设计。首先,我们为所有像素对建立双向相关性容量,使用预测的双边流来检索更新流和插值内容特征的相关性。其次,我们从一对更新的粗流中派生多组细粒度的流场,以分别对输入帧进行反向变形。将这两个设计结合起来,可以生成有前途的任务导向流,并减少建模大运动和处理遮挡区域时的困难。这些特性使我们的模型在高效性能上在各种基准测试中达到了最新水平。此外,我们的卷积模型在准确性和效率方面与基于Transformer的模型相比具有竞争优势。我们的代码可在 https://github.com/MCG-NKU/AMT 上获得。