Understanding visual question answering is going to be crucial for numerous human activities. However, it presents major challenges at the heart of the artificial intelligence endeavor. This paper presents an update on the rapid advancements in visual question answering using images that have occurred in the last couple of years. Tremendous growth in research on improving visual question answering system architecture has been published recently, showing the importance of multimodal architectures. Several points on the benefits of visual question answering are mentioned in the review paper by Manmadhan et al. (2020), on which the present article builds, including subsequent updates in the field.
翻译:视觉问题解答对于许多人类活动至关重要,然而,它提出了人工智能工作的核心重大挑战。本文介绍了使用过去几年中出现的图像进行视觉问题解答的快速进展的最新情况。最近公布了改进视觉问题解答系统结构研究的显著增长,显示了多式联运结构的重要性。关于视觉问题解答的益处的若干要点在Manmadhan等人(2020年)的审查文件中有所提及,而本文章正是以该文件为基础编写的,包括随后在该领域进行更新。