Ensemble methods have been widely used to improve the performance of machine learning methods in terms of generalization, while they are hard to use in deep learning systems, as training an ensemble of deep neural networks (DNNs) incurs an extremely higher computational overhead of model training. Recently, advanced techniques such as fast geometric ensembling (FGE) and snapshot ensemble have been proposed. These methods can train the model ensembles in the same time as a single model, thus getting around the hurdle of training time. However, their memory overhead for test-time inference remains much higher than single model based methods. Here we propose a parsimonious FGE (PFGE) that employs a lightweight ensemble of higher-performing DNNs, generated by successively-performed stochastic weight averaging procedures. Experimental results across different advanced DNN architectures on benchmark datasets CIFAR-$\{10,100\}$ and Imagenet, demonstrate that PFGE matches the state-of-the-art FGE method in terms of the generalization error, yet requires only 20% memory overhead for test-time inference. Our code is available at https://github.com/ZJLAB-AMMI/PFGE.
翻译:综合方法被广泛用于提高机械学习方法的通用性,而在深层学习系统中很难使用这些方法,因为培训深神经网络(DNNS)的集合体需要极高的计算间接模型培训。最近,提出了诸如快速几何组合(FGE)和快照组合等先进技术。这些方法可以同时将模型组合作为单一模型来培训,从而绕过培训时间的障碍。然而,它们测试时间推断的记忆管理费仍然远远高于单一模型方法。我们在这里提议了一个可使用高性能DNNS的微量组合体(PFGE),该组合体采用高性能的轻质组合体,由连续形成的平均随机偏差重量程序生成。在基准数据集CIFAR-$10,100美元和图像网上的不同高级DNNE结构的实验结果显示,PFGE在一般化错误方面与州-艺术FGE方法相匹配,但只需要20 %的MAM/MAFER 测试时间代码。