快速学习未受重视图像说明 (Prompt-based Learning for Unpaired Image Captioning)

Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs. Existing schemes usually adopt the visual concept reward of reinforcement learning to obtain the alignment between visual concepts and images. However, the cross-domain alignment is usually weak that severely constrains the overall performance of these existing schemes. Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning from VL-PTMs. We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability and abundant vision-language prior knowledge learned under VL-PTMs. We adopt the CLIP model for this research in unpaired image captioning. Specifically, the visual images are taken as input to the prompt generation module, which contains the pre-trained model as well as one feed-forward layer for prompt extraction. Then, the input images and generated prompts are aggregated for unpaired adversarial captioning learning. To further enhance the potential performance of the captioning, we designed a high-quality pseudo caption filter guided by the CLIP logits to measure correlations between predicted captions and the corresponding images. This allows us to improve the captioning model in a supervised learning manner. Extensive experiments on the COCO and Flickr30K datasets have been carried out to validate the superiority of the proposed model. We have achieved the state-of-the-art performance on the COCO dataset, which outperforms the best UIC model by 1.9% on the BLEU-4 metric. We expect that the proposed prompt-based UIC model will inspire a new line of research for the VL-PTMs based captioning.

翻译：VL-PTMS (UIC) 开发了未更新的图像解析(UIC), 以学习来自不匹配的视觉语言样板的图像描述。现有的计划通常采用强化学习的视觉概念奖励, 以获得视觉概念和图像之间的校准。但是, 交叉域校准通常很弱, 严重制约了这些现有计划的总体性能。最近VL- Language 前导模型(VL-PTMS) 的成功引发了VL- PTMS (UIC) 的快速学习。我们在此文件中展示了一个基于快速培训的UIC模型的新方案, 以培训UIC模型的快速性能为基础, 充分利用了强大的通用能力以及丰富的视觉语言先前知识。我们采用了CLIPMS的C模型模型模型模型模型, 从而在未配置的C- PTIC 图像上实现了高品质的升级。我们设计了一个高品质的CIMFIL 数据模型, 从而改进了我们最新的CIMIL 。

相关内容

UIC

关注 0

第16届IEEE泛在智能与计算国际会议（IEEE International Conference on Ubiquitous Intelligence and Computing 2019）将包括一个高选择性的技术论文计划，并附有研讨会、演示、小组讨论和主题演讲。我们欢迎高质量的论文，这些论文描述了推动普适智能和计算技术发展的原创和未发表的研究。官网链接：http://www.smart-world.org/2019/uic/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日