大数据上可缩放的计量经济学 -- -- 斯帕克的物流倒退 (Scalable Econometrics on Big Data -- The Logistic Regression on Spark) - 专知论文

会员服务 ·

0

对数几率回归 · 统计量 · Spark · TOOLS · PySpark ·

2021 年 6 月 18 日

Scalable Econometrics on Big Data -- The Logistic Regression on Spark

翻译：大数据上可缩放的计量经济学 -- -- 斯帕克的物流倒退

Aurélien Ouattara,Matthieu Bulté,Wan-Ju Lin,Philipp Scholl,Benedikt Veit,Christos Ziakas,Florian Felice,Julien Virlogeux,George Dikos

Extra-large datasets are becoming increasingly accessible, and computing tools designed to handle huge amount of data efficiently are democratizing rapidly. However, conventional statistical and econometric tools are still lacking fluency when dealing with such large datasets. This paper dives into econometrics on big datasets, specifically focusing on the logistic regression on Spark. We review the robustness of the functions available in Spark to fit logistic regression and introduce a package that we developed in PySpark which returns the statistical summary of the logistic regression, necessary for statistical inference.

翻译：超大型数据集越来越容易获得,旨在高效处理大量数据的计算工具正在迅速民主化,然而,在处理如此庞大的数据集时,传统的统计和计量经济学工具仍然缺乏流畅性。本文在大型数据集的计量经济学中,特别侧重于斯帕克的后勤回归。我们审查斯帕克现有功能的稳健性,以适应后勤回归,并推出一个我们在皮斯帕克开发的包件,该包件返回了统计推理所必需的后勤回归统计摘要。

0

相关内容

对数几率回归

对数几率回归

【干货书】统计学习导论，431页pdf讲解数据科学知识

【干货书】统计学习导论，431页pdf讲解数据科学知识

专知会员服务

81+阅读 · 2021年6月7日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

手写实现李航《统计学习方法》书中全部算法

专知会员服务

142+阅读 · 2020年5月19日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【2020新书】Kafka实战：Kafka in Action，209页pdf

【2020新书】Kafka实战：Kafka in Action，209页pdf

专知会员服务

69+阅读 · 2020年3月9日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

逻辑回归（Logistic Regression）模型简介

逻辑回归（Logistic Regression）模型简介

全球人工智能

5+阅读 · 2017年11月1日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Logistic回归第一弹——二项Logistic Regression

Logistic回归第一弹——二项Logistic Regression

机器学习深度学习实战原创交流

3+阅读 · 2015年10月22日

Productivity, Portability, Performance: Data-Centric Python

Arxiv

0+阅读 · 2021年8月23日

A Nonparametric Maximum Likelihood Approach to Mixture of Regression

Arxiv

0+阅读 · 2021年8月22日

Total Variation Regularized Fréchet Regression for Metric-Space Valued Data

Arxiv

0+阅读 · 2021年8月22日

Data Security and Privacy in Cloud Computing: Concepts and Emerging Trends

Arxiv

0+阅读 · 2021年8月21日

Regression Discontinuity Designs

Arxiv

0+阅读 · 2021年8月20日

Distributed Compression of Graphical Data

Arxiv

0+阅读 · 2021年8月20日

Using Multilevel Circulant Matrix Approximate to Speed Up Kernel Logistic Regression

Arxiv

0+阅读 · 2021年8月19日

From direct tagging to Tagging with sentences compression

From direct tagging to Tagging with sentences compression

Arxiv

6+阅读 · 2018年10月5日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

VIP会员

文章信息

相关主题

对数几率回归

相关VIP内容

【干货书】统计学习导论，431页pdf讲解数据科学知识

【干货书】统计学习导论，431页pdf讲解数据科学知识

专知会员服务

81+阅读 · 2021年6月7日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

手写实现李航《统计学习方法》书中全部算法

专知会员服务

142+阅读 · 2020年5月19日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【2020新书】Kafka实战：Kafka in Action，209页pdf

【2020新书】Kafka实战：Kafka in Action，209页pdf

专知会员服务

69+阅读 · 2020年3月9日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《全谱战争——从拓宽工具到思考不可思考之事》

《FPV武装无人机的战斗飞行艺术与科学》最新报告

无人机作战：演进、创新与未来战场

《反无人机：用于无人机探测与定位的多输入多输出雷达》最新69页

相关资讯

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

逻辑回归（Logistic Regression）模型简介

逻辑回归（Logistic Regression）模型简介

全球人工智能

5+阅读 · 2017年11月1日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Logistic回归第一弹——二项Logistic Regression

Logistic回归第一弹——二项Logistic Regression

机器学习深度学习实战原创交流

3+阅读 · 2015年10月22日

相关论文

Productivity, Portability, Performance: Data-Centric Python

Arxiv

0+阅读 · 2021年8月23日

A Nonparametric Maximum Likelihood Approach to Mixture of Regression

Arxiv

0+阅读 · 2021年8月22日

Total Variation Regularized Fréchet Regression for Metric-Space Valued Data

Arxiv

0+阅读 · 2021年8月22日

Data Security and Privacy in Cloud Computing: Concepts and Emerging Trends

Arxiv

0+阅读 · 2021年8月21日

Regression Discontinuity Designs

Arxiv

0+阅读 · 2021年8月20日

Distributed Compression of Graphical Data

Arxiv

0+阅读 · 2021年8月20日

Using Multilevel Circulant Matrix Approximate to Speed Up Kernel Logistic Regression

Arxiv

0+阅读 · 2021年8月19日

From direct tagging to Tagging with sentences compression

From direct tagging to Tagging with sentences compression

Arxiv

6+阅读 · 2018年10月5日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

微信扫码咨询专知VIP会员