In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.
翻译:在本文中,我们展示了NUWA- Infinity(NUWA-Infinity),这是一个无限视觉合成的基因模型,其定义是生成任意大小高分辨率图像或长度视频。一个自动递增而不是自动递增的生成机制,以处理这一可变大小的生成任务,即全球补丁级自动递增模式考虑补丁之间的依赖性,而一个当地象征性的自动递增模式则考虑每个补丁间视标之间的依赖性。一个近距离背景集合(NCP)被引入已经生成的缓存相关补丁,作为当前补丁的背景,这可以大大节省计算成本,而不必牺牲补丁级依赖模型。一个任意方向控制器(ADC)用于决定不同视觉合成任务的合适生成订单,并学习有序-awe定位的定位嵌入。与DALL-E、Mimagen和Parti、NUWA-Infinity相比,可以产生任意大小的高分辨率图像,支持长期视频生成。与NUWA/CUFIA相比,它也包含图像和软链接。