Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.
翻译:在图像标记中产生自然和准确的自然描述始终是一项挑战。 在本文中,我们主张采用一种新回顾机制,以模仿人类圆形标题的方式。我们回想的中间主义有三个部分:回想单元、语义指南(SG)和回想词(RWS) 。回想单元是一个文字检索模块,旨在为图像检索回想单词。SG和RWS为最佳使用被回忆的文字而取消签名。SG分支可以生成一个被回忆的背景,可以指导生成标题的过程。RWS分支负责将所谓的文字复制到标题中。由于在文本拼凑中指向中间主义,我们采用软的转换,以平衡SG和RWS(SG)之间产生的语言概率。在CIDER优化步骤中,我们还引入了个人直言语奖励(WW)来推动培训。我们的拟议方法(SG+RWS+RW)能够生成标题的过程。RWS分支负责将所谓的文字内容复制到标题中。RWS(R)RWS)分支负责将所谓的文字文字缩写文字缩写文字缩写文字缩写字母。在文本拼写中,我们从36.6/119.SBLAREDERSBLAREBL4/CAREDERSBAREMER ASBLAREMER/CAREMER ASBRBLACMER ASMER ASMER 和22AFAFER ASMER ASMER ASBLAFER ASBLAFER ASBSBSBRBRBRBRBRBRBRBRBER ASMER AS ASMER ASMER ASMER AS AS ASMER ASMER AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS 287/ CLAFAFAFAFAFAFAFAFABLAFAFATI 287/217/217/217/217/217/217/CADERMERMERMERMLADERMLADRABERMLATI