Non-visual imaging sensors are widely used in the industry for different purposes. Those sensors are more expensive than visual (RGB) sensors, and usually produce images with lower resolution. To this end, Cross-Modality Super-Resolution methods were introduced, where an RGB image of a high-resolution assists in increasing the resolution of the low-resolution modality. However, fusing images from different modalities is not a trivial task; the output must be artifact-free and remain loyal to the characteristics of the target modality. Moreover, the input images are never perfectly aligned, which results in further artifacts during the fusion process. We present CMSR, a deep network for Cross-Modality Super-Resolution, which unlike previous methods, is designed to deal with weakly aligned images. The network is trained on the two input images only, learns their internal statistics and correlations, and applies them to up-sample the target modality. CMSR contains an internal transformer that is trained on-the-fly together with the up-sampling process itself, without explicit supervision. We show that CMSR succeeds to increase the resolution of the input image, gaining valuable information from its RGB counterpart, yet in a conservative way, without introducing artifacts or irrelevant details.
翻译:非视觉成像传感器在行业中被广泛用于不同的目的。这些传感器比视觉(RGB)传感器更昂贵,通常产生分辨率较低的图像。为此,引入了跨模式超级分辨率方法,其中高分辨率RGB图像有助于提高低分辨率模式的分辨率。然而,从不同模式中引信图像并非一件微不足道的任务;产出必须是无文物的,并忠实于目标模式的特性。此外,输入图像从未完全吻合,因此在聚合过程中产生更多的文物。我们展示了CMSR,这是一个跨模式超级分辨率的深网络,与以往的方法不同,目的是处理薄弱的图像。网络仅接受两种输入图像的培训,学习其内部统计数据和相关性,并将其用于更新目标模式。CMSR包含一个内部变异器,在飞行时与升级过程本身一起接受培训,没有明确的监督。我们显示CMSR成功提高了输入图像的分辨率,而没有明确监督,而是在引入了不可靠的历史和变相器中引入了不相关的信息。