ImageNet serves as the primary dataset for evaluating the quality of computer-vision models. The common practice today is training each architecture with a tailor-made scheme, designed and tuned by an expert. In this paper, we present a unified scheme for training any backbone on ImageNet. The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks. It requires no adjustments or hyper-parameters tuning between different models, and is efficient in terms of training times. We test USI on a wide variety of architectures, including CNNs, Transformers, Mobile-oriented and MLP-only. On all models tested, USI outperforms previous state-of-the-art results. Hence, we are able to transform training on ImageNet from an expert-oriented task to an automatic seamless routine. Since USI accepts any backbone and trains it to top results, it also enables to perform methodical comparisons, and identify the most efficient backbones along the speed-accuracy Pareto curve. Implementation is available at:https://github.com/Alibaba-MIIL/Solving_ImageNet
翻译:图像网络是评价计算机观点模型质量的主要数据集。 今天的常见做法是用专家设计并调制的定制方案对每个建筑进行每个结构进行专门设计的训练。 在本文中,我们提出了一个在图像网络上培训任何骨干的统一计划。 这个名为 USI (图像网络统一计划) 的计划基于知识蒸馏和现代技巧。 它不要求不同模型之间的调整或超参数调,而且在培训时间方面是有效的。 我们测试了各种结构,包括CNN、变换器、移动导向器和MLP。 在所测试的所有模型中,USI 已经超越了以往的艺术状态结果。 因此,我们能够将图像网络培训从专家导向的任务转变为自动的无缝程序。 由于USI 接受任何主干线并训练它取得最高结果,它也能进行方法上的比较,并查明在速度- 准确性 Pareto 曲线上最高效的骨干。 执行可以在以下查阅: https://github.com/Aliba-MIIL/SNet_meving: