Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking. In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. Contrary to the established practice of limiting the capacity of the readout model, we design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions. The MDL score takes model complexity, as well as data efficiency into account. As a result, the most appropriate model for the specific task and representation will be chosen, making it a unified measure for comparison. The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures (ResNet and ViT) and objective functions (supervised and self-supervised) on a range of downstream tasks. We compare our methods with accuracy-based approaches and show that the latter are inconsistent when multiple readout models are used. Finally, we discuss important properties revealed by our evaluations such as model scaling, preferred readout model, and data efficiency.
翻译:虽然深学习的大部分成功都建立在学习良好表现的基础上,但缺乏一种严格的评价其质量的方法。在本文件中,我们把对表现的评价视为一个示范选择问题,并提议使用最低描述长度原则来设计评价指标。与限制读出模型能力的既定做法相反,我们为读出模型设计了一个混合的、独立和持续估值的模式空间,并采用转换战略来综合预测。MDL评分采用模型复杂性和数据效率。结果,将选择具体任务和表现的最适当模式,使之成为一个统一的比较措施。拟议的衡量标准可以有效地用在线方法计算,我们为各种结构(ResNet和VIT)的预先培训的愿景聚合器和一系列下游任务的客观功能(监督和自我监督的)提供结果。我们将我们的方法与基于精确的方法进行比较,并表明在使用多重读出模型时后者是不一致的。最后,我们讨论了我们的评价所揭示的重要属性,例如模型的扩大、优先读出模型、数据效率。