Model hubs with many pre-trained models (PTMs) have been a cornerstone in deep learning. Although built at a high cost, they remain \emph{under-exploited}: practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This na\"ive but common practice poses two obstacles to sufficient exploitation of pre-trained model hubs: (1) the PTM selection by popularity has no optimality guarantee; (2) only one PTM is used while the rest PTMs are ignored. Ideally, to exploit pre-trained model hubs maximally, trying all combinations of PTMs and extensively fine-tuning each PTM combination are required, which incurs exponential combinations and an unaffordable computational budget. In this paper, we propose a new paradigm of exploiting model hubs by ranking and tuning pre-trained models: (1) Our conference paper~\citep{you_logme:_2021} proposed LogME to estimate the maximum value of label evidence given features extracted by pre-trained models, which can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) The best ranked PTM can be fine-tuned and deployed if we have no preference for the model's architecture, or the target PTM can be tuned by top-K ranked PTMs via the proposed B-Tuning algorithm. The ranking part is based on the conference paper, and we complete its theoretical analyses in this paper, including the convergence proof of the heuristic evidence maximization procedure and the influence of feature dimension. The tuning part introduces a novel Bayesian Tuning (B-Tuning) method for tuning multiple PTMs, which surpasses specialized methods designed for tuning homogeneous PTMs and sets up a new state of the art for tuning heterogeneous PTMs. The new paradigm of exploiting PTM hubs can be interesting to a large audience across the machine learning community.
翻译:具有许多预先培训模型的模型枢纽(PTMs) 一直是深层学习的基石。 尽管以高成本建成了一个PTM, 但它们仍然在开发中 : 实践者通常从所提供的模型枢纽中通过受欢迎度从所提供的模型枢纽中提取一个 PTM, 然后微调 PTM 来完成目标任务。 这种有代表性但常见的做法对充分利用预先培训模型枢纽构成两个障碍:(1) 受欢迎的PTM 选择没有最佳的保证; (2) 仅使用一个 PTM, 而其余的 PTM 则被忽略。 理想的是, 要充分利用预先培训的模型枢纽, 尝试所有的组合 PTM, 并广泛微调每个PTM 组合, 需要通过指数组合组合和无法负担的计算预算预算。 在本文中, 我们提出利用模型枢纽中心的新模式枢纽, 也可以通过IM IM 最高级的排序 工具 来估算标签证据的最大值 。