Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's ability to detect out-of-distribution (OOD) data, and that one can estimate ensemble diversity by measuring the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and -- in this sense -- is not indicative of any "effective robustness". While deep ensembles are a practical way to achieve performance improvement (in agreement with prior work), our results show that they may be a tool of convenience rather than a fundamentally better model class.
翻译:包含神经网络是提高准确性的有效方法, 并且往往可以与大型模型的性能相匹配。 这一观察提出了一个自然的问题: 鉴于在深合体和类似精度的单一神经网络之间做出选择, 前者优于另一者? 最近的工作表明, 深合体可能带来超出预测能力的好处: 即不确定量化和对数据集转换的稳健性。 在这项工作中, 我们展示了这些所谓的好处的局限性, 并表明单一( 但更大的)神经网络可以复制这些品质。 首先, 我们通过任何衡量标准表明, 共合体的神经网络并不能有意义地促进共合体检测分配外( OOOD)数据的能力, 并且人们可以通过测量单一较大模型的相对改进来估计共同多样性。 其次, 我们表明, 共合体提供的OOD性表现由它们的分布性( In D) 性能决定得力很强, 而从这个意义上说, 并不是任何“ 有效坚固性” 。 首先, 我们显示, 深合体是实现绩效改进的实际方法( 在与先前的工作达成协议时, ) 显示我们的成果可能是一种更好的工具。