We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers. Current deep models learn the classifier in a fully parametric manner, ignoring the latent data structure and lacking simplicity and explainability. DNC instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids of training samples to describe class distributions and clearly explains the classification as the proximity of test data and the class sub-centroids in the feature space. Due to the distance-based nature, the network output dimensionality is flexible, and all the learnable parameters are only for data embedding. That means all the knowledge learnt for ImageNet classification can be completely transferred for pixel recognition learning, under the "pre-training and fine-tuning" paradigm. Apart from its nested simplicity and intuitive decision-making mechanism, DNC can even possess ad-hoc explainability when the sub-centroids are selected as actual training images that humans can view and inspect. Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes), with improved transparency and fewer learnable parameters, using various network architectures (ResNet, Swin) and segmentation models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights into related fields.
翻译:我们设计了一个概念上优雅但令人惊讶的有效大范围的视觉识别网络(DNC),这是一个概念上优雅但令人惊讶的有效网络,通过重访最经典和最简单的分类师之一的近地点中心,我们设计了一个概念上最接近的中央机器人(DNC),目前深层次的模型以完全分解的方式学习分类器,忽视了潜在的数据结构,缺乏简洁和解释性。DNC采用非对称、基于案例的推理;它使用培训样本的子中心机器人来描述班级分布,并清楚地解释测试数据与地物空间中等级的子中心之间的接近程度。由于以距离为基础的性质,网络产出的维度是灵活的,所有可学习的参数仅用于数据嵌入。这意味着在“预培训和微调”模式下,所有为图像网络分类而学的所有知识都可以完全用于像素识别学习。除了其固定的简单简洁和直观的决策机制外,DNNC甚至可以将子机器人分类作为人类能够查看和检查的实际培训图像图象的近地点(REB),将S-10基本参数与S-K模型进行比较进行对比,DNCS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B--B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-</s>