在大型建议模型中,模型大小的权衡:10 000美元/乘以压缩Criteo-tb DLRM模型(100GB参数改为仅10MB) (The trade-offs of model size in large recommendation models : A 10000 $\times$ compressed criteo-tb DLRM model (100 GB parameters to mere 10MB))

Embedding tables dominate industrial-scale recommendation model sizes, using up to terabytes of memory. A popular and the largest publicly available machine learning MLPerf benchmark on recommendation data is a Deep Learning Recommendation Model (DLRM) trained on a terabyte of click-through data. It contains 100GB of embedding memory (25+Billion parameters). DLRMs, due to their sheer size and the associated volume of data, face difficulty in training, deploying for inference, and memory bottlenecks due to large embedding tables. This paper analyzes and extensively evaluates a generic parameter sharing setup (PSS) for compressing DLRM models. We show theoretical upper bounds on the learnable memory requirements for achieving $(1 \pm \epsilon)$ approximations to the embedding table. Our bounds indicate exponentially fewer parameters suffice for good accuracy. To this end, we demonstrate a PSS DLRM reaching 10000$\times$ compression on criteo-tb without losing quality. Such a compression, however, comes with a caveat. It requires 4.5 $\times$ more iterations to reach the same saturation quality. The paper argues that this tradeoff needs more investigations as it might be significantly favorable. Leveraging the small size of the compressed model, we show a 4.3$\times$ improvement in training latency leading to similar overall training times. Thus, in the tradeoff between system advantage of a small DLRM model vs. slower convergence, we show that scales are tipped towards having a smaller DLRM model, leading to faster inference, easier deployment, and similar training times.

翻译：嵌入式表格以工业规模建议模型大小为主,使用存储存储量的百万字节值。在建议数据上,广受欢迎的和可公开使用的最大机器学习 MLPerf 基准是深学习建议模型(DLRM),该模型在点击通过数据方字节上受过培训。该模型包含100GB嵌入存储量(25+亿倍参数)的嵌入式存储器(25+亿字数参数)。DLRM由于其大小和相关的数据量,在培训、调试和因大量嵌入式表格而导致的内存瓶颈方面遇到困难。本文分析和广泛评价了压缩 DSS(PS) 通用参数共享设置(PSS) 用于压缩 DLRM 模型。我们在可学习的存储模型中展示了理论上的上限值($(1\\ pm\ \ \ \ / ipl) ) 向嵌入表表的直嵌入式存储器($ 1) 。我们发现PSS DLRMM 模型的缩放值值为10 000美元, 和存储器的压缩质量。这样压缩的缩缩略化, 将达到小的缩缩到小的缩放的缩放, 的缩放, 需要4.5前的精度的缩化,在排序中显示的精度的精度的精度的精度值值值值值值值值值值值值值值值值值。我们的缩, 显示的缩。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日