使用未见物体类探测和说明 (Detection and Captioning with Unseen Object Classes)

Image caption generation is one of the most challenging problems at the intersection of visual recognition and natural language modeling domains. In this work, we propose and study a practically important variant of this problem where test images may contain visual objects with no corresponding visual or textual training examples. For this problem, we propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model. In order to improve the detection component, we jointly define a class-to-class similarity based class representation and a practical score calibration mechanism. We also propose a novel evaluation metric that provides complimentary insights to the captioning outputs, by separately handling the visual and non-visual components of the captions. Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset and the zero-shot captioning approach yields promising results.

翻译：图像字幕生成是视觉识别和自然语言建模领域交汇过程中最具挑战性的问题之一。在这项工作中,我们提出并研究一个实际重要的问题变方,即测试图像可能包含视觉对象,但没有相应的视觉或文字培训实例。对于这一问题,我们提出一个基于通用零射探测模型和基于模板的句子生成模型的探测驱动方法。为了改进检测部分,我们共同定义了基于类比相似的类比代表制和实用分数校准机制。我们还提出了一个新的评估指标,通过分别处理字幕的视觉和非视觉部分,为字幕输出提供补充性见解。我们的实验显示,拟议的零光检测模型在MS-CO数据集和零光字幕方法上取得了最先进的性能,并产生了良好的效果。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/