Fusing a sequence of perfectly aligned images captured at various exposures, has shown great potential to approach High Dynamic Range (HDR) imaging by sensors with limited dynamic range. However, in the presence of large motion of scene objects or the camera, mis-alignment is almost inevitable and leads to the notorious ``ghost'' artifacts. Besides, factors such as the noise in the dark region or color saturation in the over-bright region may also fail to fill local image details to the HDR image. This paper provides a novel multi-exposure fusion model based on Swin Transformer. Particularly, we design feature selection gates, which are integrated with the feature extraction layers to detect outliers and block them from HDR image synthesis. To reconstruct the missing local details by well-aligned and properly-exposed regions, we exploit the long distance contextual dependency in the exposure-space pyramid by the self-attention mechanism. Extensive numerical and visual evaluation has been conducted on a variety of benchmark datasets. The experiments show that our model achieves the accuracy on par with current top performing multi-exposure HDR imaging models, while gaining higher efficiency.
翻译:在各种接触中拍摄到的完全一致的图像序列,显示极有可能通过动态范围有限的传感器接近高动态范围成像。然而,在现场物体或相机的大规模移动下,误对称几乎不可避免,导致臭名昭著的“鬼”文物。此外,黑暗区域的噪音或过度偏僻区域的颜色饱和等因素也可能无法填补《人类发展报告》图像的本地图像细节。本文提供了一个基于Swin变换器的新颖的多曝光聚合模型。特别是,我们设计了特征选择门,与特征提取层结合,以探测外源,并将它们从《人类发展报告》图像合成中阻断。为了在相近和适当曝光的区域重建缺失的本地细节,我们利用自留机制来利用暴露空间金字塔中的长距离背景依赖性。对各种基准数据集进行了广泛的数字和视觉评价。实验显示,我们的模型在与当前顶部进行多曝光的《人类发展报告》成像模型之间实现了准确性,同时提高了效率。</s>