高分辨率复杂场景变异器合成 (High-Resolution Complex Scene Synthesis with Transformers) - 专知论文

会员服务 ·

0

离散化 · MoDELS · 变换 · 生成模型 · state-of-the-art ·

2021 年 5 月 13 日

High-Resolution Complex Scene Synthesis with Transformers

翻译：高分辨率复杂场景变异器合成

Manuel Jahn,Robin Rombach,Björn Ommer

from arxiv, AI for Content Creation Workshop, CVPR 2021

The use of coarse-grained layouts for controllable synthesis of complex scene images via deep generative models has recently gained popularity. However, results of current approaches still fall short of their promise of high-resolution synthesis. We hypothesize that this is mostly due to the highly engineered nature of these approaches which often rely on auxiliary losses and intermediate steps such as mask generators. In this note, we present an orthogonal approach to this task, where the generative model is based on pure likelihood training without additional objectives. To do so, we first optimize a powerful compression model with adversarial training which learns to reconstruct its inputs via a discrete latent bottleneck and thereby effectively strips the latent representation of high-frequency details such as texture. Subsequently, we train an autoregressive transformer model to learn the distribution of the discrete image representations conditioned on a tokenized version of the layouts. Our experiments show that the resulting system is able to synthesize high-quality images consistent with the given layouts. In particular, we improve the state-of-the-art FID score on COCO-Stuff and on Visual Genome by up to 19% and 53% and demonstrate the synthesis of images up to 512 x 512 px on COCO and Open Images.

翻译：通过深基因模型对复杂场景图像进行可控合成的粗粒布局最近越来越受欢迎。然而,目前方法的结果仍然没有达到高分辨率合成的预期值。我们假设,这主要是因为这些方法的高度工程性,往往依赖辅助性损失和中间步骤,如遮罩生成器。在本说明中,我们对这一任务提出了一个正统方法,即基因模型以纯可能性培训为基础,而没有附加目标。为了做到这一点,我们首先优化一个强大的压缩模型,进行对抗性培训,学习通过离散潜伏瓶颈重建其投入,从而有效地剥除高分辨率细节的潜在代表性,如纹理。随后,我们培训一个自动反向变异变异变异器模型,以学习离散图像的分布,条件是有象征性的布局版本。我们的实验表明,由此产生的系统能够按照给定的布局综合高质量的图像。特别是,我们改进了COCO-Stuff和Vision Group5-%和Opencial Group 5-%和Ocal 5和5Ogrois 5和5%的合成和5-COgro化图。

0

相关内容

离散化

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

ICML 2021论文收录

ICML 2021论文收录

专知会员服务

123+阅读 · 2021年5月8日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

专知会员服务

62+阅读 · 2021年2月6日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Unsupervised Single Image Super-resolution Under Complex Noise

Arxiv

0+阅读 · 2021年7月2日

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Arxiv

1+阅读 · 2021年7月2日

High Resolution Face Editing with Masked GAN Latent Code Optimization

Arxiv

0+阅读 · 2021年7月1日

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Arxiv

0+阅读 · 2021年7月1日

SwapText: Image Based Texts Transfer in Scenes

SwapText: Image Based Texts Transfer in Scenes

Arxiv

4+阅读 · 2020年3月18日

Bridging Knowledge Graphs to Generate Scene Graphs

Bridging Knowledge Graphs to Generate Scene Graphs

Arxiv

5+阅读 · 2020年1月7日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Video-to-Video Synthesis

Video-to-Video Synthesis

Arxiv

9+阅读 · 2018年8月20日

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Arxiv

3+阅读 · 2018年8月20日

tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow

Arxiv

5+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【CVPR 2021】姿态可控的语音驱动说话人脸

专知会员服务

16+阅读 · 2021年5月13日

ICML 2021论文收录

ICML 2021论文收录

专知会员服务

123+阅读 · 2021年5月8日

【ICLR2021】彩色化变换器，Colorization Transformer

【ICLR2021】彩色化变换器，Colorization Transformer

专知会员服务

10+阅读 · 2021年2月9日

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

专知会员服务

62+阅读 · 2021年2月6日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Unsupervised Single Image Super-resolution Under Complex Noise

Arxiv

0+阅读 · 2021年7月2日

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Arxiv

1+阅读 · 2021年7月2日

High Resolution Face Editing with Masked GAN Latent Code Optimization

Arxiv

0+阅读 · 2021年7月1日

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Arxiv

0+阅读 · 2021年7月1日

SwapText: Image Based Texts Transfer in Scenes

SwapText: Image Based Texts Transfer in Scenes

Arxiv

4+阅读 · 2020年3月18日

Bridging Knowledge Graphs to Generate Scene Graphs

Bridging Knowledge Graphs to Generate Scene Graphs

Arxiv

5+阅读 · 2020年1月7日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Video-to-Video Synthesis

Video-to-Video Synthesis

Arxiv

9+阅读 · 2018年8月20日

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Arxiv

3+阅读 · 2018年8月20日

tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow

Arxiv

5+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员