Image fusion methods and metrics for their evaluation have conventionally used pixel-based or low-level features. However, for many applications, the aim of image fusion is to effectively combine the semantic content of the input images. This paper proposes a novel system for the semantic combination of visual content using pre-trained CNN network architectures. Our proposed semantic fusion is initiated through the fusion of the top layer feature map outputs (for each input image)through gradient updating of the fused image input (so-called image optimisation). Simple "choose maximum" and "local majority" filter based fusion rules are utilised for feature map fusion. This provides a simple method to combine layer outputs and thus a unique framework to fuse single-channel and colour images within a decomposition pre-trained for classification and therefore aligned with semantic fusion. Furthermore, class activation mappings of each input image are used to combine semantic information at a higher level. The developed methods are able to give equivalent low-level fusion performance to state of the art methods while providing a unique architecture to combine semantic information from multiple images.
翻译:用于评价的图像聚合方法和度量具有传统使用的像素基础或低级别特性。 但是,对于许多应用程序,图像融合的目的是有效地将输入图像的语义内容结合起来。 本文提出一个使用预先训练过的CNN网络结构的视觉内容的语义组合的新系统。 我们提议的语义融合是通过上层特征地图输出(为每个输入图像)的聚合,通过引信图像输入的梯度更新( 所谓的图像优化) 启动的。 简单的“ 切除最大” 和“ 本地多数” 过滤组合规则被用于地貌组合。 这为组合层输出提供了一个简单的方法, 从而提供了一个独特的框架, 将单层和彩色图像结合在为分类而预先训练的解析状态中, 从而与语义融合相一致。 此外, 每种输入图像的类激活映射图被用于将更高层次的语义信息组合在一起。 开发的方法能够使艺术方法的状态具有同等的“ 低级别融合性能 ” 。