用于图像说明的等级分解 (Hierarchy Parsing for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · 可理解性 · Extensibility · 图卷积神经网络/图卷积网络 · Networking ·

2019 年 9 月 10 日

Hierarchy Parsing for Image Captioning

翻译：用于图像说明的等级分解

Ting Yao,Yingwei Pan,Yehao Li,Tao Mei

from arxiv, ICCV 2019

It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image. Nevertheless, there has not been evidence in support of the idea on describing an image with a natural-language utterance. In this paper, we introduce a new design to model a hierarchy from instance level (segmentation), region level (detection) to the whole image to delve into a thorough image understanding for captioning. Specifically, we present a HIerarchy Parsing (HIP) architecture that novelly integrates hierarchical structure into image encoder. Technically, an image decomposes into a set of regions and some of the regions are resolved into finer ones. Each region then regresses to an instance, i.e., foreground of the region. Such process naturally builds a hierarchal tree. A tree-structured Long Short-Term Memory (Tree-LSTM) network is then employed to interpret the hierarchal structure and enhance all the instance-level, region-level and image-level features. Our HIP is appealing in view that it is pluggable to any neural captioning models. Extensive experiments on COCO image captioning dataset demonstrate the superiority of HIP. More remarkably, HIP plus a top-down attention-based LSTM decoder increases CIDEr-D performance from 120.1% to 127.2% on COCO Karpathy test split. When further endowing instance-level and region-level features from HIP with semantic relation learnt through Graph Convolutional Networks (GCN), CIDEr-D is boosted up to 130.6%.

翻译：人们总是相信,将图像剖析成成成形视觉图案将有助于理解和代表图像。然而,没有证据表明支持用自然语言的语义描述图像的想法。在本文件中,我们引入了一个新的设计,从实例级别(分层)、区域级别(检测)到整个图像的等级结构模型,以深入到对字幕的完整图像理解。具体地说,我们展示了一种将等级结构重新整合成图像编码器的循环分析(HIP)架构。在技术上,将图像分层结构分解成一组区域,有些区域特性被解成更细的。在本文中,我们引入了一个新的设计,从实例级别(分层)、区域(分层)、区域(检测)到整个图像结构(Tree-LSTM),然后用于解释等级结构结构结构结构结构结构结构(Tree-LSTM)结构结构,并增强所有实例、区域级和图像级的特征。我们的CO-O级数据水平(O-I级),从C-O级水平向上展示了C级水平的升级水平,然后是高端级的图像模型,再展示高端数据模型。

6

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【哈佛大学】机器学习的层次局限性，A Hierarchy of Limitations in Machine Learning

【哈佛大学】机器学习的层次局限性，A Hierarchy of Limitations in Machine Learning

专知会员服务

47+阅读 · 2020年2月12日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

专知会员服务

59+阅读 · 2019年11月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Neural Image Captioning

Neural Image Captioning

Arxiv

5+阅读 · 2019年7月2日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

VIP会员

文章信息

相关主题

图卷积神经网络/图卷积网络

相关VIP内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【哈佛大学】机器学习的层次局限性，A Hierarchy of Limitations in Machine Learning

【哈佛大学】机器学习的层次局限性，A Hierarchy of Limitations in Machine Learning

专知会员服务

47+阅读 · 2020年2月12日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

专知会员服务

59+阅读 · 2019年11月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Neural Image Captioning

Neural Image Captioning

Arxiv

5+阅读 · 2019年7月2日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

微信扫码咨询专知VIP会员