Quantifying the perceptual similarity of two images is a long-standing problem in low-level computer vision. The natural image domain commonly relies on supervised learning, e.g., a pre-trained VGG, to obtain a latent representation. However, due to domain shift, pre-trained models from the natural image domain might not apply to other image domains, such as medical imaging. Notably, in medical imaging, evaluating the perceptual similarity is exclusively performed by specialists trained extensively in diverse medical fields. Thus, medical imaging remains devoid of task-specific, objective perceptual measures. This work answers the question: Is it necessary to rely on supervised learning to obtain an effective representation that could measure perceptual similarity, or is self-supervision sufficient? To understand whether recent contrastive self-supervised representation (CSR) may come to the rescue, we start with natural images and systematically evaluate CSR as a metric across numerous contemporary architectures and tasks and compare them with existing methods. We find that in the natural image domain, CSR behaves on par with the supervised one on several perceptual tests as a metric, and in the medical domain, CSR better quantifies perceptual similarity concerning the experts' ratings. We also demonstrate that CSR can significantly improve image quality in two image synthesis tasks. Finally, our extensive results suggest that perceptuality is an emergent property of CSR, which can be adapted to many image domains without requiring annotations.
翻译:对两种图像的感知相似性进行量化是低层次计算机视觉中长期存在的一个问题。自然图像域通常依赖于监督性学习,例如经过事先培训的VGG, 以获得潜在代表;然而,由于领域转移,自然图像域预先培训的模型可能不适用于其他图像域,例如医学成像。在医学成像中,评估概念相似性完全由在不同医疗领域广泛培训的专家进行,因此,医学成像仍然缺乏具体任务和客观的认知措施。这项工作回答了以下问题:是否有必要依靠监督性学习,以获得能够测量感知相似性的有效代表,还是有足够的自我监督性代表?然而,由于领域转移,从自然成像学开始,从评估概念相似的自我监督性模型(CSR)作为众多当代结构和任务的衡量标准,并将其与现有方法进行比较。在自然成像域中,CSR的行为与监督性测试相同,作为衡量性能相似的、或自我监督性能观察性代表制,我们最终能够显示我们公司形象的双重形象。