Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations. Such abilities enable us to construct new abstract concepts or concrete objects, and are essential in involving practical knowledge to solve problems in low-resource scenarios. However, most existing methods for Natural Language Understanding (NLU) are mainly focused on textual signals. They do not simulate human visual imagination ability, which hinders models from inferring and learning efficiently from limited data samples. Therefore, we introduce an Imagination-Augmented Cross-modal Encoder (iACE) to solve natural language understanding tasks from a novel learning perspective -- imagination-augmented cross-modal understanding. iACE enables visual imagination with external knowledge transferred from the powerful generative and pre-trained vision-and-language models. Extensive experiments on GLUE and SWAG show that iACE achieves consistent improvement over visually-supervised pre-trained models. More importantly, results in extreme and normal few-shot settings validate the effectiveness of iACE in low-resource natural language understanding circumstances.
翻译:人类大脑同时整合语言和感知信息,以理解自然语言,并掌握创造想象力的关键能力。这种能力使我们能够构建新的抽象概念或具体对象,并且对于利用实用知识解决低资源情景中的问题至关重要。然而,大多数现有的自然语言理解方法(NLU)主要侧重于文字信号。它们不模拟人类视觉想象力,妨碍模型从有限的数据样本中有效地推断和学习。因此,我们引入了想象-增强跨现代环境(iACE),以便从新的学习角度 -- -- 想象-增强跨模式理解 -- -- 解决自然语言理解任务。iACE利用从强大的基因化和预先训练的视觉和语言模型中传授的外部知识,能够产生视觉想象力。关于GLUE和SWAG的广泛实验表明,iACE在视觉超强的预培训模型上取得了一致的改进。更重要的是,在极端和正常的几发环境中,结果证实了在低资源自然语言理解环境中的iACE的有效性。