Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting well-generalizable subsets of training data. Compared to simple adaptive random subset selection baselines, existing intelligent subset selection approaches are not competitive due to the time-consuming subset selection step, which involves computing model-dependent gradients and feature embeddings and applies greedy maximization of submodular objectives. Our key insight is that removing the reliance on downstream model parameters enables subset selection as a pre-processing step and enables one to train multiple models at no additional cost. In this work, we propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training while enabling superior model convergence and performance by using an easy-to-hard curriculum. Our empirical results indicate that MILO can train models $3\times - 10 \times$ faster and tune hyperparameters $20\times - 75 \times$ faster than full-dataset training or tuning without compromising performance.
翻译:深度培训网络和对大型数据集的超强参数进行调试是计算密集的。高效培训的主要研究方向之一是通过选择通用的培训数据子集降低培训成本。与简单的适应性随机子选择基线相比,现有的智能子集选择方法没有竞争力,原因是需要花费时间的子集选择步骤,这涉及到计算依赖模型的梯度和特性嵌入,并采用贪婪的子模块目标最大化。我们的关键洞察力是,消除对下游模型参数的依赖使得子集选择成为预处理步骤,并使得一个人能够免费培训多个模型。在这项工作中,我们建议MILO是一个模型-不可分解的子集选择框架,将子集选择与示范培训脱钩,同时通过使用简单易懂的课程使高级模型趋同和性能。我们的经验结果表明,MILO可以培训模型3-时间 - 10美元快速和调速20美元 - 75美元比全数据设置培训或调换不减损性业绩的速度更快。