Inspired by how the human brain employs a higher number of neural pathways when describing a highly focused subject, we show that deep attentive models used for the main vision-language task of image captioning, could be extended to achieve better performance. Image captioning bridges a gap between computer vision and natural language processing. Automated image captioning is used as a tool to eliminate the need for human agent for creating descriptive captions for unseen images.Automated image captioning is challenging and yet interesting. One reason is that AI based systems capable of generating sentences that describe an input image could be used in a wide variety of tasks beyond generating captions for unseen images found on web or uploaded to social media. For example, in biology and medical sciences, these systems could provide researchers and physicians with a brief linguistic description of relevant images, potentially expediting their work.
翻译:受人类大脑在描述高度集中的主题时如何使用更多神经路径的启发,我们展示出,用于图像字幕主要视觉语言任务的深视模型可以推广,以取得更好的性能。图像字幕弥合了计算机视觉与自然语言处理之间的差距。自动图像字幕被作为一种工具,消除了人类代理对为不可见图像制作描述性说明的必要性。自动图像字幕既具有挑战性,又有趣。一个原因是,基于AI的系统能够生成描述输入图像的句子,这些系统可以被用于各种各样的任务,而不仅仅是为在网上发现的或上传到社交媒体的看不见图像提供字幕。例如,在生物学和医学领域,这些系统可以为研究人员和医生提供相关图像的简短语言描述,并有可能加快他们的工作速度。