拆分的新奇天体说明器 (Decoupled Novel Object Captioner)

Image captioning is a challenging task where the machine automatically describes an image by sentences or phrases. It often requires a large number of paired image-sentence annotations for training. However, a pre-trained captioning model can hardly be applied to a new domain in which some novel object categories exist, i.e., the objects and their description words are unseen during model training. To correctly caption the novel object, it requires professional human workers to annotate the images by sentences with the novel words. It is labor expensive and thus limits its usage in real-world applications. In this paper, we introduce the zero-shot novel object captioning task where the machine generates descriptions without extra sentences about the novel object. To tackle the challenging problem, we propose a Decoupled Novel Object Captioner (DNOC) framework that can fully decouple the language sequence model from the object descriptions. DNOC has two components. 1) A Sequence Model with the Placeholder (SM-P) generates a sentence containing placeholders. The placeholder represents an unseen novel object. Thus, the sequence model can be decoupled from the novel object descriptions. 2) A key-value object memory built upon the freely available detection model, contains the visual information and the corresponding word for each object. The SM-P will generate a query to retrieve the words from the object memory. The placeholder will then be filled with the correct word, resulting in a caption with novel object descriptions. The experimental results on the held-out MSCOCO dataset demonstrate the ability of DNOC in describing novel concepts in the zero-shot novel object captioning task.

翻译：图像字幕是一个具有挑战性的任务, 机器会自动用句子或句子描述图像。它通常需要大量的配对图像描述说明来进行培训。但是, 一个经过预先训练的字幕模型很难应用到新领域, 新对象类别存在, 即, 对象及其描述词在模式培训期间是看不见的。要正确描述该新对象, 它需要专业的人类工作者用新词来说明图像。它需要劳动成本高, 从而限制其在现实世界应用程序中的使用。在本文中, 我们引入一个零射新对象说明任务, 机器生成描述时没有额外能力的新对象描述。但是, 要解决这个具有挑战性的问题, 我们建议一个可以完全将语言序列模型与新对象描述调和新对象的描述进行调和。 DNOC 有两个组成部分。 1 与占位符( SM- P) 一起生成一个包含占位符的句子句子。占位符代表着一个新发现的新对象。因此, 序列模型可以从新对象的物体描述中解析出关于新对象描述新对象描述新对象的能力。要显示的轨道, 将显示每个可获取的轨道。。将创建目标显示的轨道, 将创建目标记录中, 将显示的轨道。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

3D目标检测进展综述

专知会员服务

193+阅读 · 2020年4月24日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日