Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image, semantically understanding the contents of the image and answering it in natural language. Using VQA for disaster management is an important line of research due to the scope of problems that are answered by the VQA system. However, the main challenge is the delay caused by the generation of labels in the assessment of the affected areas. To tackle this, we deployed pre-trained CLIP model, which is trained on visual-image pairs. however, we empirically see that the model has poor zero-shot performance. Thus, we instead use pre-trained embeddings of text and image from this model for our supervised training and surpass previous state-of-the-art results on the FloodNet dataset. We expand this to a continual setting, which is a more real-life scenario. We tackle the problem of catastrophic forgetting using various experience replay methods. Our training runs are available at: https://wandb.ai/compyle/continual_vqa_final. Our code is available at https://github.com/AdityaKane2001/continual_vqa.
翻译:视觉问题解答(VQA)是一个多模式的任务,涉及解答来自输入图像的问题,从语义上理解图像的内容并以自然语言回答它。使用VQA进行灾害管理是一个重要的研究线,因为VQA系统所回答的问题范围很大。然而,主要的挑战在于在评估受影响地区时产生标签造成的延误。为了解决这个问题,我们采用了经过预先训练的CLIP模型,该模型经过视觉图像培训。然而,我们从经验上看到该模型的性能差。因此,我们使用预先训练过的该模型的文本和图像嵌入我们的监督培训,超过了FloodNet数据集上以前最先进的结果。我们将此扩展到一个持续的环境,这是一个更真实的情景。我们用各种经验重现方法来解决灾难性的遗忘问题。我们的训练运行在https://wandb.ai/compupyle/continual_vqa_finane。我们的代码可在 https://Kgivastia_Attimia_2001.我们的代码在https://qivastia_Aintia_stitia_Angutia_us.