In recent times, except for sporadic cases, the trend in Computer Vision is to achieve minor improvements over considerable increases in complexity. To reverse this tendency, we propose a novel method to boost image classification performances without an increase in complexity. To this end, we revisited ensembling, a powerful approach, not often adequately used due to its nature of increased complexity and training time, making it viable by specific design choices. First, we trained end-to-end two EfficientNet-b0 models (known to be the architecture with the best overall accuracy/complexity trade-off in image classification) on disjoint subsets of data (i.e. bagging). Then, we made an efficient adaptive ensemble by performing fine-tuning of a trainable combination layer. In this way, we were able to outperform the state-of-the-art by an average of 0.5\% on the accuracy with restrained complexity both in terms of number of parameters (by 5-60 times), and FLoating point Operations Per Second (by 10-100 times) on several major benchmark datasets, fully embracing the green AI.
翻译:除了零星的情况外,计算机视野的趋势是,在复杂程度大幅提高的基础上取得微小的改善。为了扭转这一趋势,我们提出了一种在不增加复杂性的情况下提高图像分类性能的新方法。为此,我们重新审视了组合这一强有力的方法,由于复杂程度和培训时间增加的性质,往往没有充分利用这种方法,因此在具体设计选择上是可行的。首先,我们培训了终端到终端两个高效的Net-b0模型(在图像分类中称为具有最佳总体准确性/复杂性交换的架构 ), 在数据脱节子集(如包装)方面,我们提出了一个高效的适应组合。 然后,我们通过对一个可训练的组合层进行微调,从而实现了高效的适应组合。 以这种方式,我们得以以平均0.5个百分点的速度超越了目前的状况,在参数数量(5-60倍)和几个主要基准数据集(10-100倍)的Floting点操作Per 二级(10-100倍)的精确度,从而完全接受绿色的AI。