In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory.
翻译:在本文中,我们建议对深神经网络采用新颖的组合技术,这种技术能够与替代方法相比大幅减少所需的记忆。特别是,我们建议通过解决一个终端到终端的优化任务,把对原始结构的不同规模结合起来,同时采用多种正规化条件,有利于共同体的多样性。由于我们的提案旨在探测和提取子结构,我们称之为结构化组合。在一次大型实验评估中,我们表明我们的方法可以达到与竞争性方法的更高或可比的精确度,同时需要的储存量要少得多得多。此外,我们从预测性校准和不确定性的角度来评估我们的组合,表明它们与最新技术相比是比较的。最后,我们提出与持续学习文献的联系,我们建议修改我们的框架,以子线性记忆成本处理连续的任务流。我们比较了一些替代战略,以缓解灾难性的遗忘,强调平均精确性和记忆的优势。