Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle. This makes it arduous to capture and analyze information for human beings, meanwhile hindering its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Extensive experiments and evaluations for specific applications indicate ROMA outperforms the state-of-the-art methods. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity. In particular, it consists of 9 long video clips including City, Highway and Monitor scenarios. All clips could be split into 603,142 frames in total, which are 20 times larger than the recently released daytime infrared-to-visible dataset IRVI.
翻译:红外摄影机常常被用来加强夜视,因为可见光照相机在没有足够光照的情况下,有60度显示低效。然而,红外数据缺乏足够的颜色对比度和代表能力,这是红外数据内在热相关成像原理的内在特征。这使得拍摄和分析人类信息十分困难,同时也阻碍了其应用。虽然没有保护的夜间红外和日间可见视频之间的域差距甚至比同时拍摄的配对差距大得多,建立有效的翻译映像将极大地促进各个领域。在这种情况下,可以同时利用夜间红外视频和翻译日间可见对配对中所含的语义信息的结构知识。为此,我们提出了一个定制的ROMA框架,将我们引入的cros-domamain regialon si-mility matching 技术结合在一起,以弥补巨大的差距。具体而言,ROMA可以有效地将未保护的夜间红外视频转化为细可见的日间可见的图像,同时通过匹配跨部区域相近距离的图像来保持骨调的一致性。此外,我们设计了一个多尺度的ROMA框架和直径直径直径直径直径直径直径直径直径直径直径直径直径直径的图像分析方法, 。