魔鬼在频率中: 自我监督视觉前训练的Gestalt Gestalt自动编码器 (The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training)

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data. Aiming at learning representations with high semantics abstracted, a group of works attempts to reconstruct non-semantic pixels with large-ratio masking strategy, which may suffer from "over-smoothing" problem, while others directly infuse semantics into targets in off-line way requiring extra data. Different from them, we shift the perspective to the Fourier domain which naturally has global perspective and present a new Masked Image Modeling (MIM), termed Geminated Gestalt Autoencoder (Ge$^2$-AE) for visual pre-training. Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space, where each other serves as not only the complementation but also the reciprocal constraints. Through this way, more robust representations can be learned in the pre-trained encoders, of which the effectiveness is confirmed by the juxtaposing experimental results on downstream recognition tasks. We also conduct several quantitative and qualitative experiments to investigate the learning behavior of our method. To our best knowledge, this is the first MIM work to solve the visual pre-training through the lens of frequency domain.

翻译：自我监督的蒙面图像模型(MIM) 模型在“ 图像- 重新构建” 从遮面图像中回收内容的“ 图像- 重新构建” 管道后, 获得了对多媒体界的日益浓厚的兴趣, 这是因为从未贴标签的数据中学习视觉代表的极强能力。以高语义抽取的高语义为目的, 一组工作尝试用大鼠遮面战略来重建非语义像素, 这可能受到“ 过度移动” 问题的影响, 而其他人则直接将语义转换成离线目标, 需要额外数据。与它们不同, 我们将视角转换到Fourier域, 这个域自然具有全球视角, 并展示新的蒙面图像模型模型(MIM), 称为Geed Gestalt Autencoder (Ge$2$- AE) 用于视觉前培训。具体地说, 我们的模型配备了精美的解变形像仪, 来重建图像内容, 从平面和频率空间, 重建图像内容, 不仅作为第一补充, 而且还是相互制约, 也是相互制约,, 通过我们的学习学习方法, 通过这个学习方法, 我们的下游过程, 通过学习学习, 学习方法, 可以确认学习学习方法, 通过学习学习的的学习的方法的学习, 学习。通过的的的方法, 通过学习学习学习学习学习方法, 。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日