通过交叉脱钩网络(CDN)增强长尾项目建议 (Empowering Long-tail Item Recommendation through Cross Decoupling Network (CDN))

Recommenders provide personalized content recommendations to users. They often suffer from highly skewed long-tail item distributions, with a small fraction of the items receiving most of the user feedback. This hurts model quality especially for the slices without much supervision. Existing work in both academia and industry mainly focuses on re-balancing strategies (e.g., up-sampling and up-weighting), leveraging content features, and transfer learning. However, there still lacks of a deeper understanding of how the long-tail distribution influences the recommendation performance. In this work, we theoretically demonstrate that the prediction of user preference is biased under the long-tail distributions. This bias comes from the discrepancy of both the prior and conditional probabilities between training data and test data. Most existing methods mainly attempt to reduce the bias from the prior perspective, which ignores the discrepancy in the conditional probability. This leads to a severe forgetting issue and results in suboptimal performance. To address the problem, we design a novel Cross Decoupling Network (CDN) to reduce the differences in both prior and conditional probabilities. Specifically, CDN (i) decouples the learning process of memorization and generalization on the item side through a mixture-of-expert structure; (ii) decouples the user samples from different distributions through a regularized bilateral branch network. Finally, a novel adapter is introduced to aggregate the decoupled vectors, and softly shift the training attention to tail items. Extensive experimental results show that CDN significantly outperforms state-of-the-art approaches on popular benchmark datasets, leading to an improvement in HR@50 (hit ratio) of 8.7\% for overall recommendation and 12.4\% for tail items.

翻译：推荐人向用户提供个性化内容建议。他们通常会遭受高度偏斜的长尾项分布, 且有一小部分项目得到用户反馈。这伤害了模型质量, 特别是切片的模型质量, 没有太多监督。学术界和行业的现有工作主要侧重于重新平衡战略( 例如, 抽查和加量)、调试内容特性和传输学习。但是, 仍然缺乏对长尾分发如何影响建议性能的更深了解。在这项工作中, 我们理论上表明, 在长尾分发中, 用户偏好的预测有偏差。这种偏差来自培训数据和测试数据之间先前和有条件的概率差异。大多数现有方法主要试图从先前的角度减少偏差, 这忽略了有条件概率的差异。这导致人们严重忘记问题和亚性表现。为了解决问题, 我们设计了一个新的 Crosy Decouple 网络( CDN) 来减少先前和有条件的稳定性差异。具体来说, CDN (i) 常规流流数据分配结构显示的是, 整个IM 结构中一个不同版本的缩缩缩缩缩缩图。

相关内容

CDN

关注 4

CDN的全称是Content Delivery Network，即内容分发网络。其基本思路是尽可能避开互联网上有可能影响数据传输速度和稳定性的瓶颈和环节，使内容传输的更快、更稳定。通过在网络各处放置节点服务器所构成的在现有的互联网基础之上的一层智能虚拟网络，CDN系统能够实时地根据网络流量和各节点的连接、负载状况以及到用户的距离和响应时间等综合信息将用户的请求重新导向离用户最近的服务节点上。其目的是使用户可就近取得所需内容，解决 Internet网络拥挤的状况，提高用户访问网站的响应速度。

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日