图像中文字编辑学习多模式相似性 (Learning Multimodal Affinities for Textual Editing in Images) - 专知论文

会员服务 ·

0

多峰值 · entity · 学成 · 成对型 · 簇 ·

2021 年 3 月 18 日

Learning Multimodal Affinities for Textual Editing in Images

翻译：图像中文字编辑学习多模式相似性

Or Perel,Oron Anschel,Omri Ben-Eliezer,Shai Mazor,Hadar Averbuch-Elor

from arxiv, ACM Transactions on Graphics 2021, to be presented in SIGGRAPH 2021

Nowadays, as cameras are rapidly adopted in our daily routine, images of documents are becoming both abundant and prevalent. Unlike natural images that capture physical objects, document-images contain a significant amount of text with critical semantics and complicated layouts. In this work, we devise a generic unsupervised technique to learn multimodal affinities between textual entities in a document-image, considering their visual style, the content of their underlying text and their geometric context within the image. We then use these learned affinities to automatically cluster the textual entities in the image into different semantic groups. The core of our approach is a deep optimization scheme dedicated for an image provided by the user that detects and leverages reliable pairwise connections in the multimodal representation of the textual elements in order to properly learn the affinities. We show that our technique can operate on highly varying images spanning a wide range of documents and demonstrate its applicability for various editing operations manipulating the content, appearance and geometry of the image.

翻译：目前,随着我们日常活动迅速采用相机,文件图像正在变得丰富和普遍。与捕捉物理物体的自然图像不同的是,文件图像包含大量带有关键语义和复杂布局的文本。在这项工作中,我们设计了一种通用的、不受监督的技术,以学习在文件图像中文本实体之间的多式联运关联,考虑到它们的视觉风格、其基本文字的内容和图像中的几何背景。然后我们利用这些学到的相似性,将图像中的文本实体自动组合成不同的语义组。我们的方法的核心是一个深度优化方案,专门为用户提供的图像专门设计一个深度优化方案,该图像在文本元素的多式表达中探测和利用可靠的对对对式连接,以便正确了解其相似性。我们表明,我们的技术可以使用分布范围很广的图像操作,并显示其适用于调控图像的内容、外观和几何形状的各种编辑操作。

0

相关内容

多峰值

最新《3D医疗图像处理》综述论文，23页pdf，3D Deep Learning on Medical Images: A Review

最新《3D医疗图像处理》综述论文，23页pdf，3D Deep Learning on Medical Images: A Review

专知会员服务

60+阅读 · 2020年7月14日

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

专知会员服务

17+阅读 · 2020年6月18日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

专知会员服务

71+阅读 · 2020年1月22日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

干货 | 为你解读34篇ACL论文

干货 | 为你解读34篇ACL论文

数据派THU

8+阅读 · 2018年6月7日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

专知

20+阅读 · 2018年4月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

Arxiv

0+阅读 · 2021年5月10日

Fusing multimodal neuroimaging data with a variational autoencoder

Arxiv

0+阅读 · 2021年5月3日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

Local Relation Networks for Image Recognition

Local Relation Networks for Image Recognition

Arxiv

4+阅读 · 2019年4月25日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Multimodal Deep Network Embedding with Integrated Structure and Attribute Information

Multimodal Deep Network Embedding with Integrated Structure and Attribute Information

Arxiv

4+阅读 · 2019年3月28日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Instance Similarity Deep Hashing for Multi-Label Image Retrieval

Arxiv

5+阅读 · 2018年3月19日

VIP会员

文章信息

相关主题

相关VIP内容

最新《3D医疗图像处理》综述论文，23页pdf，3D Deep Learning on Medical Images: A Review

最新《3D医疗图像处理》综述论文，23页pdf，3D Deep Learning on Medical Images: A Review

专知会员服务

60+阅读 · 2020年7月14日

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

【SIGIR2020】学习搜索查询的颜色表示，Learning Colour Representations of Search Queries

专知会员服务

17+阅读 · 2020年6月18日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

【厦门大学】综述：深度学习3D点云分割，Review: deep learning on 3D point clouds

专知会员服务

71+阅读 · 2020年1月22日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

干货 | 为你解读34篇ACL论文

干货 | 为你解读34篇ACL论文

数据派THU

8+阅读 · 2018年6月7日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

专知

20+阅读 · 2018年4月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

Arxiv

0+阅读 · 2021年5月10日

Fusing multimodal neuroimaging data with a variational autoencoder

Arxiv

0+阅读 · 2021年5月3日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

Local Relation Networks for Image Recognition

Local Relation Networks for Image Recognition

Arxiv

4+阅读 · 2019年4月25日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Multimodal Deep Network Embedding with Integrated Structure and Attribute Information

Multimodal Deep Network Embedding with Integrated Structure and Attribute Information

Arxiv

4+阅读 · 2019年3月28日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Text-to-Clip Video Retrieval with Early Fusion and Re-Captioning

Arxiv

4+阅读 · 2018年4月13日

Instance Similarity Deep Hashing for Multi-Label Image Retrieval

Arxiv

5+阅读 · 2018年3月19日

微信扫码咨询专知VIP会员