Blind Face Restoration (BFR) encounters inherent challenges in exploring its large solution space, leading to common artifacts like missing details and identity ambiguity in the restored images. To tackle these challenges, we propose a Likelihood-Regularized Policy Optimization (LRPO) framework, the first to apply online reinforcement learning (RL) to the BFR task. LRPO leverages rewards from sampled candidates to refine the policy network, increasing the likelihood of high-quality outputs while improving restoration performance on low-quality inputs. However, directly applying RL to BFR creates incompatibility issues, producing restoration results that deviate significantly from the ground truth. To balance perceptual quality and fidelity, we propose three key strategies: 1) a composite reward function tailored for face restoration assessment, 2) ground-truth guided likelihood regularization, and 3) noise-level advantage assignment. Extensive experiments demonstrate that our proposed LRPO significantly improves the face restoration quality over baseline methods and achieves state-of-the-art performance.
翻译:盲人脸图像恢复(BFR)在探索其庞大解空间时面临固有挑战,导致恢复图像中常出现细节缺失与身份模糊等伪影。为应对这些挑战,我们提出了似然正则化策略优化(LRPO)框架,这是首次将在线强化学习(RL)应用于BFR任务的方法。LRPO利用采样候选结果的奖励来优化策略网络,在提升低质量输入恢复性能的同时,增加高质量输出的概率。然而,直接将RL应用于BFR会产生兼容性问题,导致恢复结果与真实值显著偏离。为平衡感知质量与保真度,我们提出三项关键策略:1)针对人脸恢复评估定制的复合奖励函数;2)基于真实值的似然正则化;3)噪声级优势分配。大量实验表明,我们提出的LRPO框架在恢复质量上显著优于基线方法,并取得了最先进的性能。