Learning objectives of recommender models remain largely unexplored. Most methods routinely adopt either pointwise or pairwise loss to train the model parameters, while rarely pay attention to softmax loss due to the high computational cost. Sampled softmax loss emerges as an efficient substitute for softmax loss. Its special case, InfoNCE loss, has been widely used in self-supervised learning and exhibited remarkable performance for contrastive learning. Nonetheless, limited studies use sampled softmax loss as the learning objective to train the recommender. Worse still, none of them explore its properties and answer "Does sampled softmax loss suit for item recommendation?" and "What are the conceptual advantages of sampled softmax loss, as compared with the prevalent losses?", to the best of our knowledge. In this work, we aim to better understand sampled softmax loss for item recommendation. Specifically, we first theoretically reveal three model-agnostic advantages: (1) mitigating popularity bias, which is beneficial to long-tail recommendation; (2) mining hard negative samples, which offers informative gradients to optimize model parameters; and (3) maximizing the ranking metric, which facilitates top-K performance. Moreover, we probe the model-specific characteristics on the top of various recommenders. Experimental results suggest that sampled softmax loss is more friendly to history and graph-based recommenders (e.g., SVD++ and LightGCN), but performs poorly for ID-based models (e.g., MF). We ascribe this to its shortcoming in learning representation magnitude, making the combination with the models that are also incapable of adjusting representation magnitude learn poor representations. In contrast, the history- and graph-based models, which naturally adjust representation magnitude according to node degree, are able to compensate for the shortcoming of sampled softmax loss.
翻译:推荐人模型的学习目标基本上尚未探索。 大多数方法通常采用点数或双向损失来培训模型参数, 而很少关注由于计算成本高而导致的软麦损失。 抽样的软麦损失作为软麦损失的一种有效替代软麦损失。 其特例, 即InfoNCE损失, 被广泛用于自我监督的学习, 并表现出与众不同的学习。 然而, 有限的研究使用抽样的软麦损失作为学习目标来培训推荐人。 更糟糕的是, 它们都没有探索其属性并回答“ 是否为项目建议取样了软麦损失套件套件? ” 和 “ 与普遍损失相比,抽样的软麦片损失在概念上有什么优势? ” 据我们所知, 在这项工作中, 我们的目标是更好地了解项目建议的软麦角值损失套件损失套件。 我们首先从理论上揭示了三种模型优势:(1) 减少短度偏差, 有利于长尾调建议; (2) 挖掘硬体样本, 也提供了信息性梯度梯度, 优化模型参数; (3) 尽可能扩大软麦质的软麦质模型, 度, 进行排序缩缩缩缩缩缩缩缩缩模型, 学习。