Deep learning-based models are utilized to achieve state-of-the-art performance for recommendation systems. A key challenge for these models is to work with millions of categorical classes or tokens. The standard approach is to learn end-to-end, dense latent representations or embeddings for each token. The resulting embeddings require large amounts of memory that blow up with the number of tokens. Training and inference with these models create storage, and memory bandwidth bottlenecks leading to significant computing and energy consumption when deployed in practice. To this end, we present the problem of \textit{Memory Allocation} under budget for embeddings and propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information. Our formulation admits a practical and efficient randomized solution with Locality sensitive hashing based Memory Allocation (LMA). We demonstrate a significant reduction in the memory footprint while maintaining performance. In particular, our LMA embeddings achieve the same performance compared to standard embeddings with a 16$\times$ reduction in memory footprint. Moreover, LMA achieves an average improvement of over 0.003 AUC across different memory regimes than standard DLRM models on Criteo and Avazu datasets
翻译:利用深层次的学习模型来实现建议系统最先进的业绩。 这些模型面临的一个关键挑战是与数百万个绝对类或符号合作。 标准的方法是学习每个符号的端到端、密密潜表或嵌入。 由此形成的嵌入要求大量的内存,这些内存会因质数而爆炸。 对这些模型的培训和推断可以创建存储和记忆带宽瓶颈,导致实际部署时大量计算和能源消耗。 为此,我们提出了嵌入预算下的\textit{Memory分配}问题,并提出了记忆共享嵌入的新公式,其中记忆共享与语义信息重叠成比例。 我们的配方承认了一种实用而有效的随机化解决方案,与基于本地敏感度的散射存储分配(LMA)相适应。 我们显示,在保持性能的同时,记忆足迹显著减少。 特别是,我们的LMA嵌入实现了与标准的嵌入量为16美元/时间的记忆足迹减少值的相同性能。 此外, LMA模型在超过0.003 ARM 和AVAL 不同存储系统上实现了平均的改进。