Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
翻译:现代序列推荐模型通常利用模态特征来表示物品,这在很大程度上得益于语言和视觉建模领域的最新进展。为此,部分研究完全用模态嵌入替换ID嵌入,声称模态嵌入能够达到甚至超越ID嵌入的性能表现,从而使得ID嵌入变得不再必要。另一方面,许多研究同时利用ID和模态特征,但认为这种联合使用需要复杂的融合策略,例如多阶段训练和/或精细的对齐架构。然而,这两类研究都存在一个共同问题:对ID特征与模态特征的互补性缺乏深入理解。本文通过研究基于ID和基于文本的序列推荐模型的互补性来填补这一空白。我们证明这些模型确实能够学习到互补的信号,这意味着在合理使用的情况下,任一模型都能为另一模型带来性能提升。基于这一发现,我们提出一种新的序列推荐方法:首先通过独立模型训练保持ID与文本的互补性,随后通过简单的集成策略加以利用。尽管该方法结构简洁,但实验表明其性能优于多个具有竞争力的序列推荐基线模型。这暗示着要实现最先进的序列推荐性能,ID特征和文本特征都是不可或缺的,而复杂的融合架构则并非必需。