There is a recent trend in machine learning to increase model quality by growing models to sizes previously thought to be unreasonable. Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships, or scaling laws, that predict model quality from model size, training set size, and the available compute budget. These scaling laws allow one to choose nearly optimal hyper-parameters given constraints on available training data, model parameter count, or training computation budget. In this paper, we demonstrate that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws. We extend previous work to jointly predict loss due to model size, to training set size, and to the inherent "irreducible loss" of the task. We find that the scaling laws accurately match model performance over two orders of magnitude in both model size and training set size, and make predictions about the limits of model performance.
翻译:最近的一个趋势是机器学习,通过不断增长的模型提高模型质量,使之达到以前认为不合理的大小。最近的工作表明,具有交叉渗透性客观功能的自动递减基因模型表现出平稳的权力-法律关系或比例法,从模型规模、培训成套规模和现有计算预算来预测模型质量。这些比例法允许人们选择几乎最佳的超参数,因为现有培训数据、模型参数计数或培训计算预算都受到限制。在本文中,我们证明经过自动预测编码损失的声学模型的行为,如同它们受类似比例法的约束一样。我们扩大了以前的工作,共同预测由于模型规模、培训设定规模和任务固有的“不可减轻的损失”而造成的损失。我们发现,比例法准确匹配模型在模式规模和培训设定规模两个数量级上的性能,并对模型性能的限度作出预测。