Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
翻译:越来越多的研究兴趣集中在测序建议系统上,目的是精确地建模动态序列代表制。然而,在最先进的测序建议模型中最常用的损失功能具有根本性的局限性。 仅举几个例子,巴伊西亚个性化排名(BPR)损失就因众多负面抽样和预测偏差而出现消失的梯度问题; 比纳里跨英特罗普(BCE)损失主题为负抽样数字,因此可能忽视宝贵的负面例子,降低培训效率; 跨英特罗普(CE)损失仅侧重于培训序列的最后一个时间戳,造成对序列信息的低利用率,并导致低用户序列代表制的结果。 为避免这些限制,我们在本文件中提议计算累计跨英特罗比(CCE)损失,这是简单而直接的,其优点是无痛苦的部署、无负抽样、有效和高效的培训。 我们在五个基准数据集模型中进行广泛的实验,以显示CEECE(CE)的效益和效率。 结果显示,在三种最先进的模型上损失,NRU4Rec(C) 和SASRec) 的升级(C) 的成绩(CE) 的升级(C) 平均(C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (BL) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C) (C)