In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set. The memory network incorporates both internal and external memory blocks and selectively pays attention to each training exemplar. We show that memory-augmented neural networks are able to maintain a relatively long-term memory of scarce training exemplars, which is important for visual question answering due to the heavy-tailed distribution of answers in a general VQA setting. Experimental results on two large-scale benchmark datasets show the favorable performance of the proposed algorithm with a comparison to state of the art.
翻译:在本文中,我们利用一个记忆增强神经网络来预测对视觉问题的准确答案,即使这些答案在培训中很少出现。记忆网络包含内部和外部的记忆区块,并且有选择地关注每个培训实例。我们显示,记忆增强神经网络能够保持相对长期的稀缺培训示范体记忆,这对于视觉回答十分重要,因为一般VQA环境中的大规模解答分布繁琐。两个大型基准数据集的实验结果显示,与艺术状况相比,拟议算法的有利性。