NUWA -- -- 无限光学合成:自动递减,取代自动递减的一代(无限视觉合成) (NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis) - 专知论文

会员服务 ·

0

HTTPS · 无限 · MoDELS · Learning · 讲稿 ·

2022 年 8 月 12 日

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

翻译：NUWA -- -- 无限光学合成:自动递减,取代自动递减的一代(无限视觉合成)

Chenfei Wu,Jian Liang,Xiaowei Hu,Zhe Gan,Jianfeng Wang,Lijuan Wang,Zicheng Liu,Yuejian Fang,Nan Duan

from arxiv, 24 pages, 19 figures

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.

翻译：在本文中,我们展示了NUWA- Infinity(NUWA-Infinity),这是一个无限视觉合成的基因模型,其定义是生成任意大小高分辨率图像或长度视频。一个自动递增而不是自动递增的生成机制,以处理这一可变大小的生成任务,即全球补丁级自动递增模式考虑补丁之间的依赖性,而一个当地象征性的自动递增模式则考虑每个补丁间视标之间的依赖性。一个近距离背景集合(NCP)被引入已经生成的缓存相关补丁,作为当前补丁的背景,这可以大大节省计算成本,而不必牺牲补丁级依赖模型。一个任意方向控制器(ADC)用于决定不同视觉合成任务的合适生成订单,并学习有序-awe定位的定位嵌入。与DALL-E、Mimagen和Parti、NUWA-Infinity相比,可以产生任意大小的高分辨率图像,支持长期视频生成。与NUWA/CUFIA相比,它也包含图像和软链接。

0

相关内容

HTTPS

超文本传输安全协议是超文本传输协议和 SSL/TLS 的组合，用以提供加密通讯及对网络服务器身份的鉴定。

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

ZEALER订阅号

0+阅读 · 2022年1月27日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019| 04-03更新10篇论文及代码（3篇oral、含GAN、文本图像生成等）

CVPR2019| 04-03更新10篇论文及代码（3篇oral、含GAN、文本图像生成等）

极市平台

18+阅读 · 2019年4月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

深度低秩的结构-纹理图像分割模型和算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

动态逆混合变分不等式理论及算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂数据下含指标项半参数模型结构的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于压缩感知的图像盲恢复模型和算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于格子Boltzmann方法的致密多孔介质内的相变传热与流动机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

新型II-VI/III-V族多结叠层太阳电池材料与器件研究

国家自然科学基金

0+阅读 · 2012年12月31日

有机/碳低维杂化光电材料和器件

国家自然科学基金

0+阅读 · 2011年12月31日

Langmuir环流在上层海洋混合中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

数字图像自动修补理论与算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

Arxiv

0+阅读 · 2022年10月6日

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Arxiv

0+阅读 · 2022年10月5日

Hiding Images in Deep Probabilistic Models

Arxiv

0+阅读 · 2022年10月5日

Conformalized Fairness via Quantile Regression

Arxiv

0+阅读 · 2022年10月5日

A general framework for probabilistic sensitivity analysis with respect to distribution parameters

Arxiv

0+阅读 · 2022年10月3日

An adaptive superconvergent finite element method based on local residual minimization

Arxiv

0+阅读 · 2022年10月1日

Probabilistic Traversability Model for Risk-Aware Motion Planning in Off-Road Environments

Arxiv

0+阅读 · 2022年10月1日

A self-censoring model for multivariate nonignorable nonmonotone missing data

Arxiv

0+阅读 · 2022年9月30日

FLOWGEN: Fast and slow graph generation

Arxiv

0+阅读 · 2022年9月29日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

ZEALER订阅号

0+阅读 · 2022年1月27日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019| 04-03更新10篇论文及代码（3篇oral、含GAN、文本图像生成等）

CVPR2019| 04-03更新10篇论文及代码（3篇oral、含GAN、文本图像生成等）

极市平台

18+阅读 · 2019年4月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

Arxiv

0+阅读 · 2022年10月6日

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Arxiv

0+阅读 · 2022年10月5日

Hiding Images in Deep Probabilistic Models

Arxiv

0+阅读 · 2022年10月5日

Conformalized Fairness via Quantile Regression

Arxiv

0+阅读 · 2022年10月5日

A general framework for probabilistic sensitivity analysis with respect to distribution parameters

Arxiv

0+阅读 · 2022年10月3日

An adaptive superconvergent finite element method based on local residual minimization

Arxiv

0+阅读 · 2022年10月1日

Probabilistic Traversability Model for Risk-Aware Motion Planning in Off-Road Environments

Arxiv

0+阅读 · 2022年10月1日

A self-censoring model for multivariate nonignorable nonmonotone missing data

Arxiv

0+阅读 · 2022年9月30日

FLOWGEN: Fast and slow graph generation

Arxiv

0+阅读 · 2022年9月29日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

相关基金

深度低秩的结构-纹理图像分割模型和算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

动态逆混合变分不等式理论及算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂数据下含指标项半参数模型结构的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于压缩感知的图像盲恢复模型和算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于格子Boltzmann方法的致密多孔介质内的相变传热与流动机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

新型II-VI/III-V族多结叠层太阳电池材料与器件研究

国家自然科学基金

0+阅读 · 2012年12月31日

有机/碳低维杂化光电材料和器件

国家自然科学基金

0+阅读 · 2011年12月31日

Langmuir环流在上层海洋混合中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

数字图像自动修补理论与算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员