传统的观点认为，生成模型可以帮助真正理解视觉数据。在最近denoising diffusion models的研究中，我们重新考虑了预训练视觉表示的生成式方法。虽然直接使用diffusion models进行预训练并不能产生强大的表示，但我们在输入上使用了遮罩，将diffusion models构建为遮罩自编码器（DiffMAE）。我们的方法能够（i）为下游识别任务提供强大的初始化，（ii）进行高质量的图像修复，并且（iii）很容易扩展到视频，其中它实现了最先进的分类准确性。我们进一步进行了全面的设计选择研究，并建立了diffusion models和遮罩自编码器之间的联系。 Diffusion模型作为Masked自编码器 (Diffusion Models as Masked Autoencoders)

翻译：传统的观点认为，生成模型可以帮助真正理解视觉数据。在最近denoising diffusion models的研究中，我们重新考虑了预训练视觉表示的生成式方法。虽然直接使用diffusion models进行预训练并不能产生强大的表示，但我们在输入上使用了遮罩，将diffusion models构建为遮罩自编码器（DiffMAE）。我们的方法能够（i）为下游识别任务提供强大的初始化，（ii）进行高质量的图像修复，并且（iii）很容易扩展到视频，其中它实现了最先进的分类准确性。我们进一步进行了全面的设计选择研究，并建立了diffusion models和遮罩自编码器之间的联系。 Diffusion模型作为Masked自编码器

Chen Wei,Karttikeya Mangalam,Po-Yao Huang,Yanghao Li,Haoqi Fan,Hu Xu,Huiyu Wang,Cihang Xie,Alan Yuille,Christoph Feichtenhofer

from arxiv, Tech report. Project page: https://weichen582.github.io/diffmae.html

There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE). Our approach is capable of (i) serving as a strong initialization for downstream recognition tasks, (ii) conducting high-quality image inpainting, and (iii) being effortlessly extended to video where it produces state-of-the-art classification accuracy. We further perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

大“火”的扩散模型综述又一弹！UCF等《视觉扩散模型》综述，20页pdf详述三种通用的扩散建模框架

专知会员服务

86+阅读 · 2022年9月13日