我不能相信没有图像! 只用语言数据学习视觉任务 (I Can't Believe There's No Images! Learning Visual Tasks Using only Language Data)

Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether this makes it possible to learn those skills from text data and then use them to complete vision tasks without ever training on visual training data. Key to our approach is exploiting the joint embedding space of contrastively trained vision and language encoders. In practice, there can be systematic differences between embedding spaces for different modalities in contrastive models, and we analyze how these differences affect our approach and study a variety of strategies to mitigate this concern. We produce models using only text training data on three tasks: image captioning, visual entailment and visual question answering, and evaluate them on standard benchmarks using images. We find that this kind of transfer is possible and results in only a small drop in performance relative to models trained on images. We also showcase a variety of stylistic image captioning models that were trained using no image data and no human-curated language data, but instead text data from books, the web, or language models.

翻译：计算机视觉任务所需要的许多高级技能,例如分析问题、比较和对比语义学和写作描述,在自然语言处理等其他领域也需要这些技能。在本文中,我们问,这是否使从文本数据中学习这些技能成为可能,然后利用这些技能完成视觉任务,而无需接受视觉培训数据培训。我们的方法的关键是利用经过不同培训的视觉和语言编码器的联合嵌入空间。在实践中,为不同模式在对比模型中嵌入空间之间可能存在系统性差异,我们分析这些差异如何影响我们的方法,并研究各种战略来缓解这一关切。我们制作模型时仅使用三种任务的文字培训数据:图像说明、视觉要求和视觉问题回答,并用图像标准基准来评估这些技能。我们发现,这种转换是可能的,与所培训的图像模型相比,只能产生少量的性能下降。我们还展示了各种典型图像描述模型,这些模型没有使用图像数据,也没有人文化的语言数据,而是使用书籍、网络或语言模型的文字数据。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日