This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the interdependence between the objects/concepts in the image and the creation of a succinct sentential narration. Experiments on several labeled datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. As a toy application, we apply image captioning to create video captions, and we advance a few hypotheses on the challenges we encountered.
翻译:本文讨论并展示了我们在图像说明方面的实验结果。 图像说明比图像识别或分类要复杂得多, 因为承认图像中对象/概念之间的相互依存关系和创建简洁的感知解说会是一项额外的挑战。 在几个标签数据集上进行的实验显示了模型的准确性和它仅从图像描述中学习的语言的流畅性。 作为玩具应用,我们使用图像说明来创建视频说明,我们提出一些关于我们所遇到挑战的假设。