Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL). Since hand labeling is costly, SSL can play a crucial part by allowing deep learning to train on large unlabeled datasets. Meanwhile, Neural Architecture Search (NAS) is becoming increasingly important as a technique to discover novel deep learning architectures. However, early NAS methods based on reinforcement learning or evolutionary algorithms suffered from ludicrous computational and memory costs. In contrast, differentiable NAS, a gradient-based approach, has the advantage of being much more efficient and has thus retained most of the attention in the past few years. In this article, we present NASiam, a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair) architectures inside siamese-networks-based contrastive learning frameworks (e.g., SimCLR, SimSiam, and MoCo) while preserving the simplicity of previous baselines. We crafted a search space designed explicitly for multilayer perceptrons, inside which we explored several alternatives to the standard ReLU activation function. We show that these new architectures allow ResNet backbone convolutional models to learn strong representations efficiently. NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours. We discuss the composition of the NAS-discovered architectures and emit hypotheses on why they manage to prevent collapsing behavior. Our code is available at https://github.com/aheuillet/NASiam.
翻译:siames 网络是实现自我监督的视觉表现学习(SSL)的最趋势方法之一。 由于手贴标签成本昂贵, SSL可以通过让深层学习在大型未贴标签的数据集上进行培训而发挥关键的作用。 同时, 神经结构搜索(NAS)作为发现新型深层学习结构的一种技术, 正在变得越来越重要。 然而, 基于强化学习或演化算法的早期NAS 方法却因荒谬的计算和记忆成本而受到影响。 相反, 不同的NAS(基于梯度的NAS) 方法, 其优点是效率更高, 因而在过去几年中保持了大部分关注。 在文章中, 我们展示了NASiam, 这是一种新型NAS, 第一次使用NAS 改进多层透视仪和预测器(encoder/prectrical 配对) 结构, 其基础网络内部的对比学习框架(例如, SimCLRL, SimS, SimSiam, 和MOC) 的简化了先前的基线。 我们设计了一个搜索空间空间, 用于多层图像图像的模型,, 演示内部的模型, 并展示。