Content-based collaborative filtering (CCF) predicts user-item interactions based on both users' interaction history and items' content information. Recently, pre-trained language models (PLM) have been used to extract high-quality item encodings for CCF. However, it is resource-intensive to train a PLM-based CCF model in an end-to-end (E2E) manner, since optimization involves back-propagating through every content encoding within a given user interaction sequence. To tackle this issue, we propose GRAM (GRadient Accumulation for Multi-modality in CCF), which exploits the fact that a given item often appears multiple times within a batch of interaction histories. Specifically, Single-step GRAM aggregates each item encoding's gradients for back-propagation, with theoretic equivalence to the standard E2E training. As an extension of Single-step GRAM, we propose Multi-step GRAM, which increases the gradient update latency, achieving a further speedup with drastically less GPU memory. GRAM significantly improves training efficiency (up to 146x) on five datasets from two task domains of Knowledge Tracing and News Recommendation. Our code is available at https://github.com/yoonseok312/GRAM.
翻译:基于内容的合作过滤(CCF)预测基于用户互动历史和项目内容信息的用户-项目互动关系。最近,使用预先培训的语言模型(PLM)来为CCF提取高质量的项目编码。然而,以端对端方式培训基于PLM的CCF模式需要大量的资源,因为优化涉及在特定用户互动序列中通过每个内容编码进行回传。为了解决这一问题,我们提议GRAM(CF多调的集合)利用一个特定项目经常在一组互动历史中多次出现这一事实。具体地说,单步GRAM将每个项目编码的梯度加起来用于后方调整,与标准的E2E培训具有理论等同性。作为单步GRAM的延伸,我们提议多步GRAM,这可以提高梯度更新的延缓度,在GPU记忆中实现更快速的加速。GRAM大大改进了五个数据域(向上至14631x)的培训效率。MRAMTASTRAM 和我们两个任务域的ASTRAM ASTRAW Trealth Reals。