In this paper, we firstly present a dataset (X4K1000FPS) of 4K videos of 1000 fps with the extreme motion to the research community for video frame interpolation (VFI), and propose an extreme VFI network, called XVFI-Net, that first handles the VFI for 4K videos with large motion. The XVFI-Net is based on a recursive multi-scale shared structure that consists of two cascaded modules for bidirectional optical flow learning between two input frames (BiOF-I) and for bidirectional optical flow learning from target to input frames (BiOF-T). The optical flows are stably approximated by a complementary flow reversal (CFR) proposed in BiOF-T module. During inference, the BiOF-I module can start at any scale of input while the BiOF-T module only operates at the original input scale so that the inference can be accelerated while maintaining highly accurate VFI performance. Extensive experimental results show that our XVFI-Net can successfully capture the essential information of objects with extremely large motions and complex textures while the state-of-the-art methods exhibit poor performance. Furthermore, our XVFI-Net framework also performs comparably on the previous lower resolution benchmark dataset, which shows a robustness of our algorithm as well. All source codes, pre-trained models, and proposed X4K1000FPS datasets are publicly available at https://github.com/JihyongOh/XVFI.
翻译:在本文中,我们首先向研究界展示一个4K视频的数据集(X4K1000FPS),该数据集包含1 000 fps 的4K视频,其中向研究界展示了视频框架内插(VFI)的极端动作,并提议了一个叫做XVFI-Net的极端VFI网络,该网络首先处理4K视频的VFI。XVFI-Net基于一个循环式的多尺度共享结构,该结构由两个级联模块组成,用于在两个输入框架(BiOF-I)和从目标到输入框架(BiOF-T)的双向光学流学习。光学流通过BiOFT模块中提议的互补流程逆转(CFR),可以精确地估计出一个叫做XVFIFI-I的极端网络。BIFI/Comlicalstal Informormormations,该模块以任何规模的输入方式开始,而BIFIFIFA/CS-FIFAA/S-SAppreaccess preforstal ex resmation resmation resulate resulate resmation resulate resmationformationformationformationsmations)。