Existing perceptual similarity metrics assume an image and its reference are well aligned. As a result, these metrics are often sensitive to a small alignment error that is imperceptible to the human eyes. This paper studies the effect of small misalignment, specifically a small shift between the input and reference image, on existing metrics, and accordingly develops a shift-tolerant similarity metric. This paper builds upon LPIPS, a widely used learned perceptual similarity metric, and explores architectural design considerations to make it robust against imperceptible misalignment. Specifically, we study a wide spectrum of neural network elements, such as anti-aliasing filtering, pooling, striding, padding, and skip connection, and discuss their roles in making a robust metric. Based on our studies, we develop a new deep neural network-based perceptual similarity metric. Our experiments show that our metric is tolerant to imperceptible shifts while being consistent with the human similarity judgment.
翻译:现有感知相似度指标假定了图像,其参考值也非常吻合。 因此, 这些度量指标往往敏感于人类眼中无法察觉的微小校准错误。 本文研究了小误差的影响, 特别是输入和参考图像之间对现有测量值的微小变化, 并因此开发了一种偏移容忍性类似度指标。 本文以广泛使用的LPIPS为基础, 广泛使用的知识性概念相似度指标, 并探索建筑设计考虑, 以使之能抵御不可辨识的误差。 具体地说, 我们研究了一系列广泛的神经网络元素, 如反丑化过滤、 集合、 拼接、 挂和跳过连接, 并讨论它们在构建强力度指标中的作用 。 我们根据我们的研究, 开发了一个新的基于深线网络的感知相似度指标。 我们的实验显示, 我们的度指标在符合人类相似性判断的同时, 能够容忍不辨别的变化 。