Structure-CLIP: Enhance Multi-modal Language Representations with Structure Knowledge - 专知论文

会员服务 ·

0

知识 (knowledge) · Performer · 语言表示 · 可理解性 · 表示 ·

2023 年 5 月 6 日

Structure-CLIP: Enhance Multi-modal Language Representations with Structure Knowledge

翻译：暂无翻译

Yufeng Huang,Jiji Tang,Zhuo Chen,Rongsheng Zhang,Xinfeng Zhang,Weijie Chen,Zeng Zhao,Tangjie Lv,Zhipeng Hu,Wen Zhang

from arxiv, Work in progress

Large-scale vision-language pre-training has shown promising advances on various downstream tasks and achieved significant performance in multi-modal understanding and generation tasks. However, existing methods often perform poorly on image-text matching tasks that require a detailed semantics understanding of the text. Although there have been some works on this problem, they do not sufficiently exploit the structural knowledge present in sentences to enhance multi-modal language representations, which leads to poor performance. In this paper, we present an end-to-end framework Structure-CLIP, which integrates latent detailed semantics from the text to enhance fine-grained semantic representations. Specifically, (1) we use scene graphs in order to pay more attention to the detailed semantic learning in the text and fully explore structured knowledge between fine-grained semantics, and (2) we utilize the knowledge-enhanced framework with the help of the scene graph to make full use of representations of structured knowledge. To verify the effectiveness of our proposed method, we pre-trained our models with the aforementioned approach and conduct experiments on different downstream tasks. Numerical results show that Structure-CLIP can often achieve state-of-the-art performance on both VG-Attribution and VG-Relation datasets. Extensive experiments show its components are effective and its predictions are interpretable, which proves that our proposed method can enhance detailed semantic representation well.

翻译：暂无翻译

0

相关内容

知识 (knowledge)

知识 (knowledge)

通过学习、实践或探索所获得的认识、判断或技能。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

氧磷灰石结构的氮掺杂W-LEDs用荧光粉的制备及其发光机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

发射可调铂(II)配合物的设计和新型静电喷雾沉积电致发光器件的制备

国家自然科学基金

0+阅读 · 2015年12月31日

过渡金属催化C(sp3)-H键氟化反应的研究

国家自然科学基金

0+阅读 · 2013年12月31日

油藏两相流的局部守恒型多域耦合数值方法及其分析

国家自然科学基金

0+阅读 · 2012年12月31日

NiCoMnIn/Mg智能复合材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

新癌基因E3连接酶HECTD3表达调节机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

多孔介质中的几类流体力学模型解的性态研究

国家自然科学基金

0+阅读 · 2012年12月31日

带拟周期强迫的非线性Hamilton偏微分方程拟周期解的存在性研究

国家自然科学基金

0+阅读 · 2011年12月31日

多孔介质中的Brinkman-Forchheimer方程解的稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

One-step Multi-view Clustering with Diverse Representation

Arxiv

0+阅读 · 2023年6月27日

Medical Federated Model with Mixture of Personalized and Sharing Components

Arxiv

0+阅读 · 2023年6月26日

From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding

Arxiv

0+阅读 · 2023年6月26日

Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

Arxiv

0+阅读 · 2023年6月23日

Self-Supervised Time-to-Event Modeling with Structured Medical Records

Arxiv

0+阅读 · 2023年6月23日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

Commonsense Knowledge Base Completion with Structural and Semantic Context

Commonsense Knowledge Base Completion with Structural and Semantic Context

Arxiv

20+阅读 · 2019年12月19日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】语言模型是高效的推理者吗？——来自逻辑编程的视角

美陆军在“艾布拉姆斯”坦克与“布拉德利”步战车上测试“牛蛙”反无人机炮塔

【剑桥大学博士论文】基于注意力的图表示学习

《深度文本哈希综述：基于二进制表示的高效语义文本检索》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

One-step Multi-view Clustering with Diverse Representation

Arxiv

0+阅读 · 2023年6月27日

Medical Federated Model with Mixture of Personalized and Sharing Components

Arxiv

0+阅读 · 2023年6月26日

From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding

Arxiv

0+阅读 · 2023年6月26日

Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

Arxiv

0+阅读 · 2023年6月23日

Self-Supervised Time-to-Event Modeling with Structured Medical Records

Arxiv

0+阅读 · 2023年6月23日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

Commonsense Knowledge Base Completion with Structural and Semantic Context

Commonsense Knowledge Base Completion with Structural and Semantic Context

Arxiv

20+阅读 · 2019年12月19日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation

Arxiv

15+阅读 · 2019年1月23日

相关基金

氧磷灰石结构的氮掺杂W-LEDs用荧光粉的制备及其发光机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

发射可调铂(II)配合物的设计和新型静电喷雾沉积电致发光器件的制备

国家自然科学基金

0+阅读 · 2015年12月31日

过渡金属催化C(sp3)-H键氟化反应的研究

国家自然科学基金

0+阅读 · 2013年12月31日

油藏两相流的局部守恒型多域耦合数值方法及其分析

国家自然科学基金

0+阅读 · 2012年12月31日

NiCoMnIn/Mg智能复合材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

新癌基因E3连接酶HECTD3表达调节机制的研究

国家自然科学基金

1+阅读 · 2012年12月31日

多孔介质中的几类流体力学模型解的性态研究

国家自然科学基金

0+阅读 · 2012年12月31日

带拟周期强迫的非线性Hamilton偏微分方程拟周期解的存在性研究

国家自然科学基金

0+阅读 · 2011年12月31日

多孔介质中的Brinkman-Forchheimer方程解的稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员