Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics of video shadows, i.e., physical property, spatio relation, and temporal coherence. Specifically, a dedicated physical branch was established to conduct local illumination estimation, which is more applicable for scenes with complex lighting and textures, and then enhance the physical features via a mask-guided attention strategy. Then, we develop a progressive aggregation module to enhance the spatio and temporal characteristics of features maps, and effectively integrate the three kinds of features. Furthermore, to tackle the lack of datasets of paired shadow videos, we synthesize a dataset (SVSRD-85) with aid of the popular game GTAV by controlling the switch of the shadow renderer. Experiments against 9 state-of-the-art models, including image shadow removers and image/video restoration methods, show that our method improves the best SOTA in terms of RMSE error for the shadow area by 14.7. In addition, we develop a lightweight model adaptation strategy to make our synthetic-driven model effective in real world scenes. The visual comparison on the public SBU-TimeLapse dataset verifies the generalization ability of our model in real scenes.
翻译:近年来,单一图像中的暗影清除工作受到越来越多的关注。然而,在动态场景的阴影清除工作在很大程度上仍未得到充分探索。在本文件中,我们提出第一个数据驱动的视频阴影清除模型,称为PSTNet,方法是利用视频阴影的三个基本特征,即物理属性、spatio关系和时间一致性。具体地说,建立了一个专门的物理分支,以进行局部照明估计,这种估计更适用于光和纹理复杂的场景,然后通过蒙面引导的注意战略加强物理特征。然后,我们开发了一个渐进式集成模块,以加强地貌地图的片段和时间特征特征,并有效地整合这三种特征。此外,为了解决缺乏配对影影视频数据集的三种基本特征,我们利用流行的GTAVAV(SVSRD-85)的帮助,控制影子变形器的开关。针对9个最新模型的实验,包括图像变影器和图像/视频恢复方法,我们的方法改进了SOTA的模型,即改进了地貌图图图图图图图图图的时段,并有效地整合了我们图像变色模型的模型,我们制作了对图像变光图像的模型的模型的模型的模型的模型的模型,我们用光模型进行真正的图像变色图像的模型的模型的模型的模型的模型,我们制作了对地图图图图图图图图图的模型的模型的模型的模型的模型。</s>