【推荐】xLearn：一款专门针对大规模稀疏数据的机器学习库 - 专知

会员服务 ·

0

【推荐】xLearn：一款专门针对大规模稀疏数据的机器学习库

2017 年 11 月 25 日 机器学习研究会

点击上方 “机器学习研究会”可以订阅

摘要

转自：马超Terminal

在机器学习里，除了深度学习和树模型 (GBDT, RF) 之外，如何高效地处理高维稀疏数据也是非常重要的课题，Sparse LR, FM, FFM 这些算法被广泛运用在实际生产和kaggle比赛中。现有的开源软件例如 liblinear, libfm, libffm 都只能针对特定的算法，并且可扩展性、灵活性、易用性都不够友好。基于此，我在博士期间开发了 xLearn，一款专门针对大规模稀疏数据的机器学习库，曾在之前 NIPS 上做过展示。经过打磨，现开源 http://t.cn/RYUMtlL。我们的 vision 是将 xLearn 打造成和 xgboost，MXNet一样的工业事实标准。相比于已有的软件，xLearn的优势主要有（1）通用性好，我们用统一的架构将主流的算法（lr, fm, ffm 等）全部囊括，用户不用再切换于不同软件之间。（2）性能好。xLearn由高性能c++开发，提供 cache-aware 和 lock-free learning，并且经过手工 SSE／AVX 指令优化。在单机MacBook Pro上测试 xLearn 可以比 libfm 快13倍，比 libffm 和 liblinear 快5倍（基于Criteo CTR数据 bechmark）。（3）易用性和灵活性，xLearn 提供简单的 python 接口，并且集合了机器学习比赛中许多有用的功能，例如：cross-validation，early-stopping 等。除此之外，用户可以灵活选择优化算法（例如，SGD，AdaGrad, FTRL 等）(4) 可扩展性好。xLearn 提供 out-of-core 计算，利用外存计算可以在单机处理 1TB 数据。除此之外，xLearn 也提供分布式训练功能。这里我希望更多的朋友加入这个开源项目！

What is xLearn?

xLearn is a high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale classification and regression problems. If you are the user of liblinear, libfm, or libffm, now the xLearn is your another better choice.

Performance

xLearn is developed by high-performance C++ code with careful design and optimizations. Our system is designed to maximize the CPU and memory utilizations, provide cache-aware computation, and support lock-free learning. By combining these insights, xLearn is 5x - 13x faster compared to the similar systems.

链接：

https://github.com/aksnzhy/xlearn

原文链接：

https://m.weibo.cn/1633615122/4177635582125091

“完整内容”请点击【阅读原文】

↓↓↓

登录查看更多

3

相关内容

稀疏数据

在数据库中，稀疏数据是指在二维表中含有大量空值的数据；即稀疏数据是指，在数据集中绝大多数数值缺失或者为零的数据。稀疏数据绝对不是无用数据，只不过是信息不完全，通过适当的手段是可以挖掘出大量有用信息。

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

近期必读的6篇顶会WWW2020【推荐系统】相关论文-Part3

近期必读的6篇顶会WWW2020【推荐系统】相关论文-Part3

专知会员服务

58+阅读 · 2020年4月14日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【CMU教程】高效大规模机器学习训练，198页PDF带你概览领域前沿进展

【CMU教程】高效大规模机器学习训练，198页PDF带你概览领域前沿进展

专知会员服务

50+阅读 · 2019年11月25日

【经典图书】机器学习基础，427页pdf Foundations of machine learning

【经典图书】机器学习基础，427页pdf Foundations of machine learning

专知会员服务

158+阅读 · 2019年11月14日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

专知会员服务

88+阅读 · 2019年10月21日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

机器学习资源大全中文版

机器学习资源大全中文版

智能交通技术

15+阅读 · 2019年7月24日

【干货】推荐系统中的机器学习算法与评估实战

【干货】推荐系统中的机器学习算法与评估实战

专知

9+阅读 · 2018年6月11日

推荐系统机器学习算法概览

推荐系统机器学习算法概览

论智

7+阅读 · 2017年12月14日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【机器学习】推荐13个机器学习框架

【机器学习】推荐13个机器学习框架

产业智能官

8+阅读 · 2017年9月10日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Spark机器学习：矩阵及推荐算法

Spark机器学习：矩阵及推荐算法

LibRec智能推荐

16+阅读 · 2017年8月3日

干货：详解个性化推荐五大最常用算法

干货：详解个性化推荐五大最常用算法

数据分析

6+阅读 · 2017年7月19日

详解个性化推荐五大最常用算法

详解个性化推荐五大最常用算法

量子位

4+阅读 · 2017年7月8日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Label Embedded Dictionary Learning for Image Classification

Label Embedded Dictionary Learning for Image Classification

Arxiv

6+阅读 · 2019年3月7日

Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Arxiv

5+阅读 · 2018年11月8日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Arxiv

14+阅读 · 2018年6月6日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Learning Topic Models by Neighborhood Aggregation

Arxiv

3+阅读 · 2018年2月22日

Fast Interactive Image Retrieval using large-scale unlabeled data

Arxiv

4+阅读 · 2018年2月12日

Online Representation Learning with Single and Multi-layer Hebbian Networks for Image Classification

Arxiv

5+阅读 · 2018年1月29日

VIP会员

相关主题

梯度提升决策树

相关VIP内容

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

近期必读的6篇顶会WWW2020【推荐系统】相关论文-Part3

近期必读的6篇顶会WWW2020【推荐系统】相关论文-Part3

专知会员服务

58+阅读 · 2020年4月14日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【CMU教程】高效大规模机器学习训练，198页PDF带你概览领域前沿进展

【CMU教程】高效大规模机器学习训练，198页PDF带你概览领域前沿进展

专知会员服务

50+阅读 · 2019年11月25日

【经典图书】机器学习基础，427页pdf Foundations of machine learning

【经典图书】机器学习基础，427页pdf Foundations of machine learning

专知会员服务

158+阅读 · 2019年11月14日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

【推荐系统/计算广告/机器学习/CTR预估资料汇总】

专知会员服务

88+阅读 · 2019年10月21日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

机器学习资源大全中文版

机器学习资源大全中文版

智能交通技术

15+阅读 · 2019年7月24日

【干货】推荐系统中的机器学习算法与评估实战

【干货】推荐系统中的机器学习算法与评估实战

专知

9+阅读 · 2018年6月11日

推荐系统机器学习算法概览

推荐系统机器学习算法概览

论智

7+阅读 · 2017年12月14日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【机器学习】推荐13个机器学习框架

【机器学习】推荐13个机器学习框架

产业智能官

8+阅读 · 2017年9月10日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Spark机器学习：矩阵及推荐算法

Spark机器学习：矩阵及推荐算法

LibRec智能推荐

16+阅读 · 2017年8月3日

干货：详解个性化推荐五大最常用算法

干货：详解个性化推荐五大最常用算法

数据分析

6+阅读 · 2017年7月19日

详解个性化推荐五大最常用算法

详解个性化推荐五大最常用算法

量子位

4+阅读 · 2017年7月8日

相关论文

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Label Embedded Dictionary Learning for Image Classification

Label Embedded Dictionary Learning for Image Classification

Arxiv

6+阅读 · 2019年3月7日

Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Arxiv

5+阅读 · 2018年11月8日

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月17日

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Arxiv

14+阅读 · 2018年6月6日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Learning Topic Models by Neighborhood Aggregation

Arxiv

3+阅读 · 2018年2月22日

Fast Interactive Image Retrieval using large-scale unlabeled data

Arxiv

4+阅读 · 2018年2月12日

Online Representation Learning with Single and Multi-layer Hebbian Networks for Image Classification

Arxiv

5+阅读 · 2018年1月29日

大家都在搜

软件无线电

CMU博士论文

无人机集群

OpenKG开源系列 | 海洋鱼类百科知识图谱（浙江大学）

微信扫码咨询专知VIP会员