This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x--y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplification explicitly models the spatial invariance along horizontal and vertical directions separately, supporting prediction across image blocks. This introduces a very simple masked image modeling (MIM) method, named QB-Heat. QB-Heat leaves a single block with size of quarter image unmasked and extrapolates other three masked quarters linearly. It brings MIM to CNNs without bells and whistles, and even works well for pre-training light-weight networks that are suitable for both image classification and object detection without fine-tuning. Compared with MoCo-v2 on pre-training a Mobile-Former with 5.8M parameters and 285M FLOPs, QB-Heat is on par in linear probing on ImageNet, but clearly outperforms in non-linear probing that adds a transformer block before linear classifier (65.6% vs. 52.9%). When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7.9 and 4.5 AP respectively. This work provides an insightful hypothesis on the invariance within visual representation over different shapes and textures: the linear relationship between horizontal and vertical derivatives. The code will be publicly released.
翻译:本文展示了基于将热等式扩展为高维特征空间的自我监督学习的新视角。 特别是, 我们以稳定状态条件去除时间依赖性, 并将其余的 2D Laplacian 从 x- y 等同性向直线相关关系扩展为 2D Laplacian 。 此外, 我们通过将 x 和 y 轴作为两个一阶线性线性差异方程来简化它。 这种简化明确地模拟了水平和垂直方向之间的空间差异, 支持了对图像区块的预测。 这引入了一个非常简单的隐藏式图像模型( MIM) 方法, 名为 QB- Heat 。 QB- Heat 设置了一个单一块, 以四分之一的图像不固定状态显示, 将其余的 2D Laplac 形状从 X- 扩展为线性 。 将 MIM 到CNN 没有钟和 线性线性线性线性线性线性线性平面性平面性平面性平比 。 在直线性平面性平面性平面图性平面图式平面图解前, 和直径直径直径直径直径直径直径直径平介介介下, 。 在直向下, 在直图中, 在直对图中, 直对图中, 在直径性平面图上, 6 直径性平面平面性平面性平面性平面图上, 。