Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000x more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3x and 1.52x in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy
翻译:通常使用建议模型向用户推荐电子商业和在线广告应用程序的相关项目。 这些模型使用大规模嵌入表存储项目和用户绝对变量的数字表示(模拟密集),并使用神经网络(计算密集)产生最终建议。 培训这些大型建议模型正在演变,要求增加数据和计算资源。 这些模型中高度平行的神经网络部分可以受益于GPU加速功能, 但是, 大型嵌入表往往无法适应能力有限的 GPU 设备存储。 因此, 纸张在培训数据的语义中进行深层潜入, 以存储项目和用户绝对变量的数字表达( 模拟密集), 并使用大量嵌入表存储项目和用户绝对变量( 模拟密集) 和绝对变量( 模拟) 的功能表达器, 由于某些投入的受欢迎度, 嵌入这些嵌入的功能被高度扭曲, 一些嵌入条目被访问到10000x的嵌入条目中。 本文将这种对准的接入模式用于提供一个框架, 称为 FAEE,, 并提议为培训模式的快速累积有意识的数据布局。 这一布局利用稀缺的 GPU 将G- 的G- PL 运行 的深度存储存储在实时的C- 运行中, 运行中, 快速运行中, 快速运行中, 快速运行中的数据传输数据传输数据传输数据传输中, 显示这些C- 快速运行中的数据传输中的数据传输中, 快速运行中的数据传输中, 显示为同步数据传输中 。