Ensembles of deep neural networks are known to achieve state-of-the-art performance in uncertainty estimation and lead to accuracy improvement. In this work, we focus on a classification problem and investigate the behavior of both non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble as a function of the ensemble size and the member network size. We indicate the conditions under which CNLL follows a power law w.r.t. ensemble size or member network size, and analyze the dynamics of the parameters of the discovered power laws. Our important practical finding is that one large network may perform worse than an ensemble of several medium-size networks with the same total number of parameters (we call this ensemble a memory split). Using the detected power law-like dependencies, we can predict (1) the possible gain from the ensembling of networks with given structure, (2) the optimal memory split given a memory budget, based on a relatively small number of trained networks. We describe the memory split advantage effect in more details in arXiv:2005.07292
翻译:众所周知,深神经网络的集合可以达到最先进的不确定性估计性能,并导致精确度的提高。在这项工作中,我们侧重于分类问题,并调查一个没有校正和校准的深合体负日志相似度(CNLL)作为组合体大小和成员网络大小函数的一种函数的行为。我们指出CNLL遵循权力法(w.r.t.)的组合体大小或成员网络大小的条件,并分析所发现的权力法参数的动态。我们的重要实际发现是,一个大型网络可能比几个具有相同总参数的中型网络的组合体(我们称这个集合体是一个记忆分裂体)还要差。我们可以利用所发现的类似权力法的依附体,预测(1) 以特定结构将网络组合起来可能获得的收益,(2) 以相对较少的受过培训的网络数量为基础,根据记忆预算优化的记忆分裂。我们用更详细的方式描述在ArXiv:2005.07292中出现的记忆分裂优势。我们用更详细的方式描述记忆分割效果:2005.07292。