Image captioning generates text that describes scenes from input images. It has been developed for high quality images taken in clear weather. However, in bad weather conditions, such as heavy rain, snow, and dense fog, the poor visibility owing to rain streaks, rain accumulation, and snowflakes causes a serious degradation of image quality. This hinders the extraction of useful visual features and results in deteriorated image captioning performance. To address practical issues, this study introduces a new encoder for captioning heavy rain images. The central idea is to transform output features extracted from heavy rain input images into semantic visual features associated with words and sentence context. To achieve this, a target encoder is initially trained in an encoder-decoder framework to associate visual features with semantic words. Subsequently, the objects in a heavy rain image are rendered visible by using an initial reconstruction subnetwork (IRS) based on a heavy rain model. The IRS is then combined with another semantic visual feature matching subnetwork (SVFMS) to match the output features of the IRS with the semantic visual features of the pretrained target encoder. The proposed encoder is based on the joint learning of the IRS and SVFMS. It is is trained in an end-to-end manner, and then connected to the pretrained decoder for image captioning. It is experimentally demonstrated that the proposed encoder can generate semantic visual features associated with words even from heavy rain images, thereby increasing the accuracy of the generated captions.
翻译:图像字幕产生描述输入图像的图像的文字。 它是为在晴朗的天气条件下拍摄的高质量图像而开发的。 但是, 在恶劣的天气条件下, 如大雨、雪和浓雾, 降雨量、 积雨和雪花导致的可见度差, 导致图像质量严重退化。 这妨碍了提取有用的视觉特征, 并导致图像说明性能恶化。 为解决实际问题, 本研究为描述大雨图像引入了新的编码器。 中心思想是将从大雨输入图像中提取出来的输出特性转换成与文字和句内环境相关的语义视觉特征。 要达到这一点, 目标编码器最初是在一个编码- 解码框架内训练一个目标特征与语义性词的解码框架内, 之后, 使用一个基于大雨模型的初始重建子网络( IRS ) 来显示大雨性说明性功能。 IRS 与另一个匹配的语义性视觉特征( SVFMS ) 相匹配, 以便匹配 IRS 的语义性图示特征与目标前编程的精度特征特征特征特征。 将S 编成为S 演示后 。 编程 和演化成的图解成的图解成 。 。 的图解成为 。 的 的 以S 和演制成为制成为 的 的 。