Automatic image captioning, which involves describing the contents of an image, is a challenging problem with many applications in various research fields. One notable example is designing assistants for the visually impaired. Recently, there have been significant advances in image captioning methods owing to the breakthroughs in deep learning. This survey paper aims to provide a structured review of recent image captioning techniques, and their performance, focusing mainly on deep learning methods. We also review widely-used datasets and performance metrics, in addition to the discussions on open problems and unsolved challenges in image captioning.
翻译:自动图像字幕涉及描述图像内容,是一个具有挑战性的问题,涉及各种研究领域的许多应用,一个显著的例子就是设计视力障碍者助手。最近,由于深层学习的突破,图像字幕方法取得了显著进步。这份调查文件旨在对最近的图像字幕技术及其性能进行结构化审查,主要侧重于深层学习方法。我们还审查了广泛使用的数据集和性能衡量标准,此外还讨论了公开问题和图像字幕方面尚未解决的挑战。