关于抽样抽样合作过滤数据集 (On Sampling Collaborative Filtering Datasets) - 专知论文

会员服务 ·

0

Performer · 样本 · Performance · 数据集 · 协同过滤 ·

2022 年 1 月 13 日

On Sampling Collaborative Filtering Datasets

翻译：关于抽样抽样合作过滤数据集

Noveen Sachdeva,Carole-Jean Wu,Julian McAuley

from arxiv, 9 pages, 4 figures, accepted for publication at WSDM '22. arXiv admin note: substantial text overlap with arXiv:2107.04984

We study the practical consequences of dataset sampling strategies on the ranking performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or ad-hoc fashion: e.g. by sampling a dataset randomly or by selecting users or items with many interactions. As we demonstrate, commonly-used data sampling schemes can have significant consequences on algorithm performance. Following this observation, this paper makes three main contributions: (1) characterizing the effect of sampling on algorithm performance, in terms of algorithm and dataset characteristics (e.g. sparsity characteristics, sequential dynamics, etc.); (2) designing SVP-CF, which is a data-specific sampling strategy, that aims to preserve the relative performance of models after sampling, and is especially suited to long-tailed interaction data; and (3) developing an oracle, Data-Genie, which can suggest the sampling scheme that is most likely to preserve model performance for a given dataset. The main benefit of Data-Genie is that it will allow recommender system practitioners to quickly prototype and compare various approaches, while remaining confident that algorithm performance will be preserved, once the algorithm is retrained and deployed on the complete data. Detailed experiments show that using Data-Genie, we can discard upto 5x more data than any sampling strategy with the same level of performance.

翻译：我们研究了数据集抽样战略对建议算法绩效排名的实际后果。建议系统一般经过培训,对较大数据集的样本进行评估。抽样往往以天真或临时的方式进行:例如随机抽样数据集,或选择用户或项目,有许多互动。我们证明,常用的数据抽样方案可能对算法绩效产生重大影响。根据这项观察,本文件提出三个主要贡献:(1) 将抽样对算法绩效的影响定性为算法和数据集特征(例如孔径特征、相继动态等));(2) 设计SVP-CF,这是针对特定数据的抽样战略,目的是在取样后保持模型的相对性能,特别适合长期的交互数据;(3) 开发一个Ocle,即数据-Genie,它可以提出最有可能为特定数据集保存模型绩效的取样方案。Data-Genie的主要好处是,它将允许推荐的系统从业者迅速进行原型和比较各种方法,同时保持信心,在取样后将保持模型业绩与我们所部署的任何详细程度。

0

相关内容

Performer

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

2021 推荐系统领域最新研究进展

2021 推荐系统领域最新研究进展

图与推荐

1+阅读 · 2021年10月26日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

随机进程代数模型的Fluid逼近问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于概率图模型的海量评分数据分析与用户行为建模

国家自然科学基金

0+阅读 · 2014年12月31日

多源农业遥感数据的尺度转换

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

考虑长期监测应力时序的钢箱梁桥疲劳评估方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于视频信号空时稀疏的压缩感知重构方法

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于视觉感知的多视点视频编码研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于密集图像/视频采样的三维重建技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Approximate Sampling and Counting of Graphs with Near-$P$-stable Degree Intervals

Arxiv

0+阅读 · 2022年4月20日

HRCF: Enhancing Collaborative Filtering via Hyperbolic Geometric Regularization

Arxiv

0+阅读 · 2022年4月18日

MetaKG: Meta-learning on Knowledge Graph for Cold-start Recommendation

Arxiv

0+阅读 · 2022年4月18日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Arxiv

25+阅读 · 2021年4月27日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation

MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation

Arxiv

39+阅读 · 2019年7月31日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

14+阅读 · 2018年5月19日

Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation

Arxiv

11+阅读 · 2018年5月9日

Attention-based Group Recommendation

Arxiv

14+阅读 · 2018年4月18日

Learning over Knowledge-Base Embeddings for Recommendation

Arxiv

23+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大型语言模型遇上文本属性图：一种融合框架与应用的综述

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

【博士论文】用于概率程序与生成模型的变分推断

军事指挥控制系统：2025年5种用途

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

2021 推荐系统领域最新研究进展

2021 推荐系统领域最新研究进展

图与推荐

1+阅读 · 2021年10月26日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

相关论文

Approximate Sampling and Counting of Graphs with Near-$P$-stable Degree Intervals

Arxiv

0+阅读 · 2022年4月20日

HRCF: Enhancing Collaborative Filtering via Hyperbolic Geometric Regularization

Arxiv

0+阅读 · 2022年4月18日

MetaKG: Meta-learning on Knowledge Graph for Cold-start Recommendation

Arxiv

0+阅读 · 2022年4月18日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Arxiv

25+阅读 · 2021年4月27日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation

MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation

Arxiv

39+阅读 · 2019年7月31日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

14+阅读 · 2018年5月19日

Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation

Arxiv

11+阅读 · 2018年5月9日

Attention-based Group Recommendation

Arxiv

14+阅读 · 2018年4月18日

Learning over Knowledge-Base Embeddings for Recommendation

Arxiv

23+阅读 · 2018年3月22日

相关基金

随机进程代数模型的Fluid逼近问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于概率图模型的海量评分数据分析与用户行为建模

国家自然科学基金

0+阅读 · 2014年12月31日

多源农业遥感数据的尺度转换

国家自然科学基金

0+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

考虑长期监测应力时序的钢箱梁桥疲劳评估方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于视频信号空时稀疏的压缩感知重构方法

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于视觉感知的多视点视频编码研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于密集图像/视频采样的三维重建技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员