Sequential recommender systems aim to model users' evolving interests from their historical behaviors, and hence make customized time-relevant recommendations. Compared with traditional models, deep learning approaches such as CNN and RNN have achieved remarkable advancements in recommendation tasks. Recently, the BERT framework also emerges as a promising method, benefited from its self-attention mechanism in processing sequential data. However, one limitation of the original BERT framework is that it only considers one input source of the natural language tokens. It is still an open question to leverage various types of information under the BERT framework. Nonetheless, it is intuitively appealing to utilize other side information, such as item category or tag, for more comprehensive depictions and better recommendations. In our pilot experiments, we found naive approaches, which directly fuse types of side information into the item embeddings, usually bring very little or even negative effects. Therefore, in this paper, we propose the NOninVasive self-attention mechanism (NOVA) to leverage side information effectively under the BERT framework. NOVA makes use of side information to generate better attention distribution, rather than directly altering the item embedding, which may cause information overwhelming. We validate the NOVA-BERT model on both public and commercial datasets, and our method can stably outperform the state-of-the-art models with negligible computational overheads.
翻译:与传统模式相比,有线电视新闻网和RNN等深层次的学习方法在建议任务方面取得了显著进展。最近,BERT框架也作为一种有希望的方法出现,受益于其处理相继数据的自留机制。然而,最初的BERT框架的一个局限性是它只考虑自然语言象征物的一个输入源。在BERT框架下,利用各种类型的信息仍然是一个有待解决的问题。然而,它直截了当地呼吁利用其他侧面信息,如项目类别或标签,以进行更全面的描述和提出更好的建议。在我们试点实验中,我们发现天真的方法,将各种侧面信息直接结合到项目嵌入中,通常很少或甚至产生负面效应。因此,在本文中,我们建议NOninVasive自留机制(NOVA)在BERT框架下有效地利用侧信息。NOVA利用侧信息来产生更好的关注模式,而不是直接改变项目嵌入的路径,从而可以压倒性地验证我们市一级和市级数据模式。