Quantitative evaluation has increased dramatically among recent video inpainting work, but the video and mask content used to gauge performance has received relatively little attention. Although attributes such as camera and background scene motion inherently change the difficulty of the task and affect methods differently, existing evaluation schemes fail to control for them, thereby providing minimal insight into inpainting failure modes. To address this gap, we propose the Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark, which consists of two contributions: (i) a novel dataset of videos and masks labeled according to several key inpainting failure modes, and (ii) an evaluation scheme that samples slices of the dataset characterized by a fixed content attribute, and scores performance on each slice according to reconstruction, realism, and temporal consistency quality. By revealing systematic changes in performance induced by particular characteristics of the input content, our challenging benchmark enables more insightful analysis into video inpainting methods and serves as an invaluable diagnostic tool for the field. Our code and data are available at https://github.com/MichiganCOG/devil .
翻译:在近期的录像绘画工作中,定量评价显著增加,但用于衡量业绩的视频和遮罩内容相对较少受到注意,虽然相机和背景场景运动等属性必然改变任务难度,对方法产生不同的影响,但现有的评价计划无法控制它们,从而对绘制失败模式提供了极少的洞察力。为弥补这一差距,我们提议对关于景观图画的视频绘画(DEVIL)基准进行诊断性评价,该基准由两项贡献组成:(一) 一套新颖的录像和面具数据集,按照几个关键绘制失败模式标注,以及(二) 一种评价计划,根据重建、现实主义和时间一致性质量,对每个切片上以固定内容属性为特征的数据集进行切片,并评分业绩。通过揭示投入内容的具体特点导致的系统性能变化,我们具有挑战性的基准能够对视频绘画方法进行更深刻的分析,并成为实地的宝贵诊断工具。我们的代码和数据可在https://github.com/MichganCOG/devile上查阅。