DocSegTr: 一例级端对端文档图像分割图变形器 (DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer) - 专知论文

会员服务 ·

0

Extensibility · 端到端 · 图像分割 · INFORMS · Performer ·

2022 年 1 月 27 日

DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

翻译：DocSegTr: 一例级端对端文档图像分割图变形器

Sanket Biswas,Ayan Banerjee,Josep Lladós,Umapada Pal

from arxiv, Submitted to International Workshop on Document Analysis Systems (DAS) 2022

Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects(title, sections, figures, tables and so on) has emerged as an interesting problem for the document layout analysis community. To advance the research in this direction, we present a transformer-based model for end-to-end segmentation of complex layouts in document images. To our knowledge, this is the first work on transformer-based document segmentation. Extensive experimentation on the PubLayNet dataset shows that our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches. We hope our simple and flexible framework could serve as a promising baseline for instance-level recognition tasks in document images.

翻译：了解内容丰富的文件是信息提取的关键步骤。商业情报程序通常要求从大规模文件中提取有用的语义内容,用于随后的决策任务。在这方面,不同文件对象(标题、章节、图表、表格等)的例级分解已成为文件布局分析界一个有趣的问题。为了推进这方面的研究,我们提出了一个基于变压器的模型,用于文件图像中复杂布局的端到端分解。据我们所知,这是关于基于变压器的文件分解的首项工作。PubLayNet数据集的广泛实验表明,我们的模型比现有的最新方法实现了可比或更好的分解功能。我们希望我们简单而灵活的框架能够作为文件图像中实例级识别任务的有希望的基准。

1

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

20+阅读 · 2022年3月18日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

26+阅读 · 2022年3月3日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

129+阅读 · 2021年6月16日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

122+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

17+阅读 · 2017年10月13日

跟踪器融合的视觉跟踪方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

电压控制Click化学法构筑多通道DNA和核酸适配体电化学传感器

国家自然科学基金

0+阅读 · 2013年12月31日

基于FrameNet的中文评价词汇本体构建与观点挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于蚁群算法面向对象的遥感图像分类方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于彩色图像处理技术的PTV流场测量方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于空间平台的空间目标成像及识别方法研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于模糊与核聚类算法的脑磁共振图像分割方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

气候变化下流域水文极端事件的响应机制及预测－以东江为例

国家自然科学基金

0+阅读 · 2009年12月31日

面向GIS的文本空间关系解析机制研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

Core Box Image Recognition and its Improvement with a New Augmentation Technique

Arxiv

0+阅读 · 2022年4月20日

FocusNet: Classifying Better by Focusing on Confusing Classes

Arxiv

0+阅读 · 2022年4月20日

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Arxiv

0+阅读 · 2022年4月19日

UniGDD: A Unified Generative Framework for Goal-Oriented Document-Grounded Dialogue

Arxiv

0+阅读 · 2022年4月16日

ORCNet: A context-based network to simultaneously segment the ocular region components

Arxiv

0+阅读 · 2022年4月15日

Image Segmentation Using Deep Learning: A Survey

Arxiv

17+阅读 · 2020年11月15日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

17+阅读 · 2019年4月9日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

11+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

20+阅读 · 2022年3月18日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

26+阅读 · 2022年3月3日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

129+阅读 · 2021年6月16日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

122+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

17+阅读 · 2017年10月13日

相关论文

Core Box Image Recognition and its Improvement with a New Augmentation Technique

Arxiv

0+阅读 · 2022年4月20日

FocusNet: Classifying Better by Focusing on Confusing Classes

Arxiv

0+阅读 · 2022年4月20日

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Arxiv

0+阅读 · 2022年4月19日

UniGDD: A Unified Generative Framework for Goal-Oriented Document-Grounded Dialogue

Arxiv

0+阅读 · 2022年4月16日

ORCNet: A context-based network to simultaneously segment the ocular region components

Arxiv

0+阅读 · 2022年4月15日

Image Segmentation Using Deep Learning: A Survey

Arxiv

17+阅读 · 2020年11月15日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

17+阅读 · 2019年4月9日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

11+阅读 · 2018年1月11日

相关基金

跟踪器融合的视觉跟踪方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

电压控制Click化学法构筑多通道DNA和核酸适配体电化学传感器

国家自然科学基金

0+阅读 · 2013年12月31日

基于FrameNet的中文评价词汇本体构建与观点挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于蚁群算法面向对象的遥感图像分类方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于彩色图像处理技术的PTV流场测量方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于空间平台的空间目标成像及识别方法研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于模糊与核聚类算法的脑磁共振图像分割方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

气候变化下流域水文极端事件的响应机制及预测－以东江为例

国家自然科学基金

0+阅读 · 2009年12月31日

面向GIS的文本空间关系解析机制研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员