《规则通则:培训方法的重叠特点》 (No One Representation to Rule Them All: Overlapping Features of Training Methods)

Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions. This seemingly implies that high-performing models share similar biases regardless of training methodology, which would limit ensembling benefits and render low-accuracy models as having little practical use. Against this backdrop, recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy on generalization and robustness benchmarks. This motivates us to revisit the assumption that models necessarily learn similar functions. We conduct a large-scale empirical study of models across hyper-parameters, architectures, frameworks, and datasets. We find that model pairs that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors. We show these models specialize in subdomains of the data, leading to higher ensemble performance: with just 2 models (each with ImageNet accuracy ~76.5%), we can create ensembles with 83.4% (+7% boost). Surprisingly, we find that even significantly low-accuracy models can be used to improve high-accuracy models. Finally, we show diverging training methodology yield representations that capture overlapping (but not supersetting) feature sets which, when combined, lead to increased downstream performance.

翻译：尽管能够捕捉到数据的一系列特征,但经过监督培训的高精度模型往往会做出类似的预测。这似乎意味着高性能模型无论培训方法如何,都有着相似的偏差。这似乎意味着高性能模型无论培训方法如何,都具有相似的偏差,这限制了组合效益,使低准确性模型几乎没有实际用途。在这种背景下,最近的工作产生了非常不同的培训技术,如大规模对比学习,在概括性和稳健性基准方面产生高竞争力的精确度。这促使我们重新审视模型必然会学习类似功能的假设。我们对超参数、结构、框架和数据集的模型进行大规模的经验性研究。我们发现,在培训方法方面差异更大的模型对等表现出完全不同的概括行为,产生越来越不切实际的错误。我们展示了这些模型在数据子域中的特殊性,导致共性业绩的提高:只有两个模型(每个模型都具有图像网的精确度~76.5%),我们就能创造83.4%(+7%的振荡度)的聚合性研究。我们发现,在培训方法中,即使低度模型的混合性模型也无法用来显示,最终的跨度模型能够改进高度,从而显示高度模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/