Video Question Answering (VQA) is a recent emerging challenging task in the field of Computer Vision. Several visual information retrieval techniques like Video Captioning/Description and Video-guided Machine Translation have preceded the task of VQA. VQA helps to retrieve temporal and spatial information from the video scenes and interpret it. In this survey, we review a number of methods and datasets for the task of VQA. To the best of our knowledge, no previous survey has been conducted for the VQA task.
翻译:视频问题解答(VQA)是计算机视野领域最近一项新出现的挑战性任务,在VQA的任务之前,一些视觉信息检索技术,如视频说明/描述和视频制导机器翻译。VQA帮助从视频场段检索和解释时间和空间信息。在这次调查中,我们审查了用于VQA任务的若干方法和数据集。据我们所知,以前没有就VQA任务进行过任何调查。