LayoutDETR: Detection Transformer是一个出色的多模式布局设计工具 (LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer) - 专知论文

会员服务 ·

0

多模式 · 设计 · 多模 · 内容感知 · 基准 ·

2023 年 3 月 24 日

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

翻译：LayoutDETR: Detection Transformer是一个出色的多模式布局设计工具

Ning Yu,Chia-Chih Chen,Zeyuan Chen,Rui Meng,Gang Wu,Paul Josel,Juan Carlos Niebles,Caiming Xiong,Ran Xu

Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.

翻译：图形布局设计在视觉传达中扮演着至关重要的角色。然而，手工制作布局设计是需要技能的、耗时的，并且无法批量生产。生成模型被用来使设计自动化变得可扩展，但是生产符合设计师多模态需求、即在背景图像约束下、在前景内容驱动下的设计仍然不容易。我们提出了LayoutDETR，它继承了生成建模的高质量和真实性，同时将内容感知的要求重新定义为一个检测问题：我们学会在背景图像中检测多模态前景元素在布局中合理的位置、比例和空间关系。我们的解决方案在公共基准测试和我们新策划的广告横幅数据集上设定了新的最高性能水平。我们将我们的解决方案集成到了一个图形系统中，该系统可以促进用户研究，并显示用户相比于基准系统更喜欢我们的设计。我们的代码、模型、数据集、图形系统和演示在 https://github.com/salesforce/LayoutDETR 上提供。

0

相关内容

多模式

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

58+阅读 · 2022年12月10日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【AAAI2021】面向真实世界的鲁棒视觉信息提取:新的数据集和新颖的解决方案

【AAAI2021】面向真实世界的鲁棒视觉信息提取:新的数据集和新颖的解决方案

专知会员服务

22+阅读 · 2021年2月17日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

从设计交付到开发，轻松畅快高效率！

从设计交付到开发，轻松畅快高效率！

谷歌开发者

0+阅读 · 2022年6月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

重组人磷脂酶D2干预哮喘中Treg特征的研究

国家自然科学基金

0+阅读 · 2016年12月31日

面向复杂管路网络流固耦合的阻抗分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

基于静压气浮无摩擦气缸的高精度重力补偿系统的研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向快速成型的转子叶片动力学光滑拓扑优化设计

国家自然科学基金

0+阅读 · 2013年12月31日

可重构变胞并联机器人机构综合设计方法与驱动配置研究

国家自然科学基金

1+阅读 · 2012年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Volterra级数的ADC数字后台校正技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

乳腺癌PIWIL2、PLAC1抗原HLA-A2限制性CTL表位鉴定以及综合改造策略研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于复杂加工对象精度需求的高性能复杂制造装备主动精度设计方法

国家自然科学基金

0+阅读 · 2008年12月31日

Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow

Arxiv

0+阅读 · 2023年5月16日

Fooling State-of-the-Art Deepfake Detection with High-Quality Deepfakes

Arxiv

1+阅读 · 2023年5月16日

Out-of-Distribution Detection for Adaptive Computer Vision

Arxiv

0+阅读 · 2023年5月16日

Neuromodulation Gated Transformer

Arxiv

0+阅读 · 2023年5月11日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

【干货书】Python强化学习算法:学习、理解和开发智能算法以应对人工智能挑战，356页pdf，附代码

专知会员服务

58+阅读 · 2022年12月10日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【AAAI2021】面向真实世界的鲁棒视觉信息提取:新的数据集和新颖的解决方案

【AAAI2021】面向真实世界的鲁棒视觉信息提取:新的数据集和新颖的解决方案

专知会员服务

22+阅读 · 2021年2月17日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

从设计交付到开发，轻松畅快高效率！

从设计交付到开发，轻松畅快高效率！

谷歌开发者

0+阅读 · 2022年6月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

【泡泡一分钟】扫描环境：用于3D点云地图中场景识别的自我中心空间描述符

泡泡机器人SLAM

22+阅读 · 2019年1月17日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

相关论文

Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow

Arxiv

0+阅读 · 2023年5月16日

Fooling State-of-the-Art Deepfake Detection with High-Quality Deepfakes

Arxiv

1+阅读 · 2023年5月16日

Out-of-Distribution Detection for Adaptive Computer Vision

Arxiv

0+阅读 · 2023年5月16日

Neuromodulation Gated Transformer

Arxiv

0+阅读 · 2023年5月11日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

相关基金

重组人磷脂酶D2干预哮喘中Treg特征的研究

国家自然科学基金

0+阅读 · 2016年12月31日

面向复杂管路网络流固耦合的阻抗分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

基于静压气浮无摩擦气缸的高精度重力补偿系统的研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向快速成型的转子叶片动力学光滑拓扑优化设计

国家自然科学基金

0+阅读 · 2013年12月31日

可重构变胞并联机器人机构综合设计方法与驱动配置研究

国家自然科学基金

1+阅读 · 2012年12月31日

低交叉极化共形天线阵列综合的混合DE算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Volterra级数的ADC数字后台校正技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

乳腺癌PIWIL2、PLAC1抗原HLA-A2限制性CTL表位鉴定以及综合改造策略研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于复杂加工对象精度需求的高性能复杂制造装备主动精度设计方法

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员