Scale has been a major driving force in improving machine learning performance, and understanding scaling laws is essential for strategic planning for a sustainable model quality performance growth, long-term resource planning and developing efficient system infrastructures to support large-scale models. In this paper, we study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR). We observe that model quality scales with power law plus constant in model size, data size and amount of compute used for training. We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute by comparing the different scaling schemes along these axes. We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward. The key research questions addressed by this study include: Does a recommendation model scale sustainably as predicted by the scaling laws? Or are we far off from the scaling law predictions? What are the limits of scaling? What are the implications of the scaling laws on long-term hardware/system development?
翻译:规模是提高机器学习绩效的主要动力,理解比例法对于可持续模型质量绩效增长的战略规划、长期资源规划和开发高效系统基础设施以支持大规模模型至关重要。在本论文中,我们研究了DLRM风格建议模型的经验规模法,特别是点击率。我们观察到,具有动力法的模型质量尺度加上模型大小、数据大小和计算数量不变的不变值用于培训。我们根据三个不同的资源层面,即数据、参数和通过比较这些轴线上不同的比例计划来计算来确定比例效率。我们表明,参数的缩放是正在研究的模型结构蒸汽所消耗出来的,在出现业绩更高的模型结构之前,数据缩放是前进的道路。本研究涉及的关键研究问题包括:建议模型尺度是否如比例法预测的那样可持续?我们是否远离了比例法预测? 缩放的限度是什么? 缩放法对长期硬件/系统发展有何影响?