Transferring knowledge from an image synthesis model trained on a large dataset is a promising direction for learning generative image models from various domains efficiently. While previous works have studied GAN models, we present a recipe for learning vision transformers by generative knowledge transfer. We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers. To adapt to a new domain, we employ prompt tuning, which prepends learnable tokens called prompt to the image token sequence, and introduce a new prompt design for our task. We study on a variety of visual domains, including visual task adaptation benchmark~\cite{zhai2019large}, with varying amount of training images, and show effectiveness of knowledge transfer and a significantly better image generation quality over existing works.
翻译:从在大型数据集上培训的图像合成模型转移知识,是有效学习不同领域基因化图像模型的一个有希望的方向。虽然以前的工作已经研究了GAN模型,但我们通过基因知识转让为学习视觉变异器提供了一种配方。我们以最先进的基因化变异器为框架,这些变异器代表着自动递减或非反向变异器的视觉象征序列。为了适应新的领域,我们采用了快速调控,这预示着可学习的象征物会迅速呼唤图像代号序列,并为我们的任务推出新的快速设计。我们研究了各种视觉领域,包括视觉任务适应基准 'cite{zhai2019大},其培训图像数量不尽相同,并展示了知识转移的实效以及比现有工程更高质量的图像生成质量。