Blind face restoration is to recover a high-quality face image from unknown degradations. As face image contains abundant contextual information, we propose a method, RestoreFormer, which explores fully-spatial attentions to model contextual information and surpasses existing works that use local operators. RestoreFormer has several benefits compared to prior arts. First, unlike the conventional multi-head self-attention in previous Vision Transformers (ViTs), RestoreFormer incorporates a multi-head cross-attention layer to learn fully-spatial interactions between corrupted queries and high-quality key-value pairs. Second, the key-value pairs in ResotreFormer are sampled from a reconstruction-oriented high-quality dictionary, whose elements are rich in high-quality facial features specifically aimed for face reconstruction, leading to superior restoration results. Third, RestoreFormer outperforms advanced state-of-the-art methods on one synthetic dataset and three real-world datasets, as well as produces images with better visual quality.
翻译:失明面部恢复是指从未知的退化中恢复高质量的面部图像。 面部图像包含丰富的背景信息, 我们提出一种方法, “ 恢复Former ”, 探索对模拟背景信息的全面空间关注, 并超越使用本地操作员的现有作品。 “ 恢复Former ” 与以前的艺术相比有几个好处。 首先, 与以往的视觉变形器( VITs) 中传统的多头自我关注不同, “ 恢复Former ” 包含一个多头交叉关注层, 学习腐败查询和高质量关键价值对方之间的全面空间互动。 第二, ResotreFormer 中的关键价值对方, 是从以重建为导向的高质量字典中取样, 其元素丰富于高质量面部特征, 专门用于面部重建, 导致更优越的恢复结果。 第三, “ 恢复Former ” 在一个合成数据集和三个真实世界数据集上超越了先进的先进状态方法, 以及产生更高质量的图像。