Due to the difficulty in collecting large-scale and perfectly aligned paired training data for Under-Display Camera (UDC) image restoration, previous methods resort to monitor-based image systems or simulation-based methods, sacrificing the realness of the data and introducing domain gaps. In this work, we revisit the classic stereo setup for training data collection -- capturing two images of the same scene with one UDC and one standard camera. The key idea is to "copy" details from a high-quality reference image and "paste" them on the UDC image. While being able to generate real training pairs, this setting is susceptible to spatial misalignment due to perspective and depth of field changes. The problem is further compounded by the large domain discrepancy between the UDC and normal images, which is unique to UDC restoration. In this paper, we mitigate the non-trivial domain discrepancy and spatial misalignment through a novel Transformer-based framework that generates well-aligned yet high-quality target data for the corresponding UDC input. This is made possible through two carefully designed components, namely, the Domain Alignment Module (DAM) and Geometric Alignment Module (GAM), which encourage robust and accurate discovery of correspondence between the UDC and normal views. Extensive experiments show that high-quality and well-aligned pseudo UDC training pairs are beneficial for training a robust restoration network. Code and the dataset are available at https://github.com/jnjaby/AlignFormer.
翻译:由于获取大规模和完全对准的屏下相机图像修复训练数据的困难,之前的方法采用了基于监视器的图像系统或基于模拟的方法,牺牲了数据的真实性并引入了域差距。在这项工作中,我们重新审视了经典的立体设置,以用于训练数据的收集,即使用一个UDC和一个标准相机捕获同一场景的两个图像。关键思想是从高质量的参考图像中“复制”细节并“粘贴”到UDC图像上。虽然能够生成真实的训练对,但由于透视和景深变化,这种设置容易受到空间错位的影响。此问题由于UDC与普通图像之间的大域差异而更加棘手,这是UDC修复中独有的。在本文中,我们通过一种新颖的基于Transformer的框架来缓解非平凡的域差异和空间错位问题,以生成对准的高质量目标数据,用于相应的UDC输入。这是通过两个精心设计的组件实现的,即域对齐模块(DAM)和几何对齐模块(GAM),它们鼓励可靠和准确地发现UDC和普通视图之间的对应关系。广泛的实验表明,高质量和对准的UDC伪训练对有助于训练鲁棒的修复网络。代码和数据集可在 https://github.com/jnjaby/AlignFormer 获得。