When using cut-and-paste to acquire a composite image, the geometry inconsistency between foreground and background may severely harm its fidelity. To address the geometry inconsistency in composite images, several existing works learned to warp the foreground object for geometric correction. However, the absence of annotated dataset results in unsatisfactory performance and unreliable evaluation. In this work, we contribute a Spatial TRAnsformation for virtual Try-on (STRAT) dataset covering three typical application scenarios. Moreover, previous works simply concatenate foreground and background as input without considering their mutual correspondence. Instead, we propose a novel correspondence learning network (CorrelNet) to model the correspondence between foreground and background using cross-attention maps, based on which we can predict the target coordinate that each source coordinate of foreground should be mapped to on the background. Then, the warping parameters of foreground object can be derived from pairs of source and target coordinates. Additionally, we learn a filtering mask to eliminate noisy pairs of coordinates to estimate more accurate warping parameters. Extensive experiments on our STRAT dataset demonstrate that our proposed CorrelNet performs more favorably against previous methods.
翻译:使用切面和粘贴图获得复合图像时,前景和背景之间的几何不一致可能严重损害其真实性。为了解决复合图像中的几何不一致问题,一些现有作品已经学会了对前景对象进行几何校正。然而,缺乏附加注释的数据集导致业绩不令人满意,评价不可靠。在这项工作中,我们为虚拟试镜(STRAT)数据集贡献了空间轨迹,涵盖三种典型应用情景。此外,以往的工作只是将源头和背景作为输入输入进行组合,而没有考虑它们之间的对应。相反,我们提议建立一个新的通信学习网络(CorrelNet),用交叉注意的地图模拟地表和背景之间的通信,以此为基础,我们可以预测目标坐标坐标,即每个源头的坐标都应在背景上绘制。然后,地表物体的扭曲参数可以从源对源和目标坐标组合中得出。此外,我们学习了一个过滤面罩,以消除更精确的坐标组合,以估计更精确的交错参数。我们在STRAT数据集上进行的广泛实验,表明我们提议的CorrelNet比先前的方法更有利。