通过图像级别微弱监视视觉概念识别未受保护的图像描述 (Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition)

The goal of unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase. Although challenging, we except the task can be accomplished by leveraging a training set of images aligned with visual concepts. Most existing studies use off-the-shelf algorithms to obtain the visual concepts because the Bounding Box (BBox) labels or relationship-triplet labels used for the training are expensive to acquire. In order to resolve the problem in expensive annotations, we propose a novel approach to achieve cost-effective UIC. Specifically, we adopt image-level labels for the optimization of the UIC model in a weakly-supervised manner. For each image, we assume that only the image-level labels are available without specific locations and numbers. The image-level labels are utilized to train a weakly-supervised object recognition model to extract object information (e.g., instance) in an image, and the extracted instances are adopted to infer the relationships among different objects based on an enhanced graph neural network (GNN). The proposed approach achieves comparable or even better performance compared with previous methods without the expensive cost of annotations. Furthermore, we design an unrecognized object (UnO) loss combined with a visual concept reward to improve the alignment of the inferred object and relationship information with the images. It can effectively alleviate the issue encountered by existing UIC models about generating sentences with nonexistent objects. To the best of our knowledge, this is the first attempt to solve the problem of Weakly-Supervised visual concept recognition for UIC (WS-UIC) based only on image-level labels. Extensive experiments have been carried out to demonstrate that the proposed WS-UIC model achieves inspiring results on the COCO dataset while significantly reducing the cost of labeling.

翻译：UIC 的目标是在培训阶段不使用图像显示配对来描述图像。尽管我们的任务具有挑战性, 但我们的任务可以通过使用与视觉概念相匹配的图像培训组合来完成。大多数现有研究使用现成的算法来获取视觉概念, 因为用于培训的Bounding Box(BBox)标签或关系三重标签成本高昂。为了在昂贵的注释中解决问题, 我们提议了一种创新的方法, 以实现具有成本效益的 UIC 。具体地说, 我们采用图像级标签来优化 UIC 模型, 其方式是, 以薄弱的超强监督方式优化UIC 模型。对于每一种图像, 我们假设只有图像级标签可以使用现成的现成算法来获取视觉识别模型信息。所拟议的图像级标签只能通过强化的图形解析网络( GNNEN) 来评估不同对象之间的关系。所拟议的图像级标签只能通过不易变现、甚至更精确的图像级化模型, 与前期的图像模型相比, 我们的模型只能以高廉的图像比重的模型, 。

相关内容

UIC

关注 0

第16届IEEE泛在智能与计算国际会议（IEEE International Conference on Ubiquitous Intelligence and Computing 2019）将包括一个高选择性的技术论文计划，并附有研讨会、演示、小组讨论和主题演讲。我们欢迎高质量的论文，这些论文描述了推动普适智能和计算技术发展的原创和未发表的研究。官网链接：http://www.smart-world.org/2019/uic/

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日