Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage the embedding table in the CPU and GPU memory space by leveraging the id's frequency statistics of the target dataset. Our proposed software cache is efficient in training entire DLRMs on GPU in a synchronized update manner. It is also scaled to multiple GPUs in combination with the widely used hybrid parallel training approaches. Evaluating our prototype system shows that we can keep only 1.5% of the embedding parameters in the GPU to obtain a decent end-to-end training speed.
翻译:深学习建议模型(DLRM)已在互联网公司中广泛应用。 DLRM 嵌入表太大,无法完全适合 GPU 内存。 我们建议采用基于 GPU 的软件缓存方法来动态管理 CPU 和 GPU 内存空间的嵌入表。 我们提议的软件缓存能够有效地以同步更新的方式在 GPU 上对整个 DLRM 进行培训。 还将它与广泛使用的混合平行培训方法相结合, 扩大到多个 GPU 。 对原型系统的评估显示, 我们只能保留 CPU 内嵌入参数的1.5%, 以获得体面的端到端培训速度 。