Nowadays deep learning-based methods have achieved a remarkable progress at the image classification task among a wide range of commonly used datasets (ImageNet, CIFAR, SVHN, Caltech 101, SUN397, etc.). SOTA performance on each of the mentioned datasets is obtained by careful tuning of the model architecture and training tricks according to the properties of the target data. Although this approach allows setting academic records, it is unrealistic that an average data scientist would have enough resources to build a sophisticated training pipeline for every image classification task he meets in practice. This work is focusing on reviewing the latest augmentation and regularization methods for the image classification and exploring ways to automatically choose some of the most important hyperparameters: total number of epochs, initial learning rate value and it's schedule. Having a training procedure equipped with a lightweight modern CNN architecture (like bileNetV3 or EfficientNet), sufficient level of regularization and adaptive to data learning rate schedule, we can achieve a reasonable performance on a variety of downstream image classification tasks without manual tuning of parameters to each particular task. Resulting models are computationally efficient and can be deployed to CPU using the OpenVINO toolkit. Source code is available as a part of the OpenVINO Training Extensions (https://github.com/openvinotoolkit/training_extensions).
翻译:目前,深层次的学习方法在一系列常用数据集(ImagageNet、CIFAR、SVHN、Caltech 101、SUN397等)的图像分类任务方面取得了显著进展。每个上述数据集的SOTA性能都是通过根据目标数据特性对模型结构进行仔细调整和培训技巧取得的。虽然这种方法允许建立学术记录,但平均数据科学家拥有足够资源为他在实践中遇到的每个图像分类任务建立精密的培训管道是不现实的。这项工作的重点是审查图像分类的最新增强和正规化方法,并探索自动选择一些最重要的超参数的方法:总耳科数、初始学习率值和时间表。拥有轻量的现代CNN结构(如bileNet3或效率网络)的培训程序,对数据学习进度表有足够的规范化和适应性,我们可以在各种下游图像分类任务上取得合理的业绩,而无需对每一项特定任务进行手工调整。对图像分类模型进行计算效率,并可将模型自动选择一些最重要的超参数:即:小区总数、初始学习率值和时间表。具有轻量的CNN(如BileNet3或效率的网络)培训程序,我们可以对各种下游图像分类进行正常的版本培训。