评估电池性能中非正常分布数据的多线回归对多线回归对基于树的回归的评价 (Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance) - 专知论文

会员服务 ·

0

多重共线性 · Performer · 线性回归 · MoDELS · 线性的 ·

2021 年 11 月 3 日

Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance

翻译：评估电池性能中非正常分布数据的多线回归对多线回归对基于树的回归的评价

Shovan Chowdhury,Yuxiao Lin,Boryann Liaw,Leslie Kerby

from arxiv, 16 pages

Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine learning models. In this work, tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset with multicollinearity and compared. Several techniques are necessary, such as data transformation, to achieve a good multiple linear regression model with this dataset; the most useful techniques are discussed. With these techniques, the best multiple linear regression model achieved an R^2 = 81.23% and exhibited no multicollinearity effect for the dataset used in this study. Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity. We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of R^2 = 97.73%. This study explains why tree-based regressions promise as a machine learning model for non-normally distributed, multicollinear data.

翻译：电池性能数据集通常是非正常的和多层的。用于模型预测的外推这类数据集需要注意这些特性。本研究探讨了数据正常度在建立机器学习模型中的影响。在这项工作中, 树基回归模型和多线回归模型都是从高度偏斜的非正常数据集中建立的, 具有多线性和比较。需要几种技术, 如数据转换等, 才能在这个数据集中实现良好的多线回归模型; 讨论最有用的技术。有了这些技术, 最佳的多个线性回归模型就实现了 R ⁇ 2 = 81. 23%, 并且没有为本研究中使用的数据集显示多线性效应。基于树的模型在这个数据集上表现更好, 因为它们是非参数模型, 能够处理变量之间的复杂关系, 不受多线性影响。我们显示, 在使用随机森林时, 袋化会减少过度匹配。我们的最佳树基模型实现了 R ⁇ 2 = 97.73 % 。本研究解释了为什么基于树基的回归模型作为非正常分布的机器学习模型, 多线性数据的前景。

0

相关内容

多重共线性

多重共线性

【干货书】统计学习导论，431页pdf讲解数据科学知识

【干货书】统计学习导论，431页pdf讲解数据科学知识

专知会员服务

80+阅读 · 2021年6月7日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

102+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Distributed Random Reshuffling over Networks

Arxiv

0+阅读 · 2022年1月9日

Detecting Renewal States in Chains of Variable Length via Intrinsic Bayes Factors

Arxiv

0+阅读 · 2022年1月6日

Sparsity-based Feature Selection for Anomalous Subgroup Discovery

Arxiv

0+阅读 · 2022年1月6日

Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification

Arxiv

0+阅读 · 2022年1月6日

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Arxiv

0+阅读 · 2022年1月6日

Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

Arxiv

0+阅读 · 2022年1月5日

Efficient Importance Sampling Algorithm Applied to the Performance Analysis of Wireless Communication Systems Estimation

Arxiv

0+阅读 · 2022年1月4日

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Arxiv

4+阅读 · 2020年3月6日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

VIP会员

文章信息

相关主题

多重共线性

相关VIP内容

【干货书】统计学习导论，431页pdf讲解数据科学知识

【干货书】统计学习导论，431页pdf讲解数据科学知识

专知会员服务

80+阅读 · 2021年6月7日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

102+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

相关论文

Distributed Random Reshuffling over Networks

Arxiv

0+阅读 · 2022年1月9日

Detecting Renewal States in Chains of Variable Length via Intrinsic Bayes Factors

Arxiv

0+阅读 · 2022年1月6日

Sparsity-based Feature Selection for Anomalous Subgroup Discovery

Arxiv

0+阅读 · 2022年1月6日

Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification

Arxiv

0+阅读 · 2022年1月6日

Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds

Arxiv

0+阅读 · 2022年1月6日

Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

Arxiv

0+阅读 · 2022年1月5日

Efficient Importance Sampling Algorithm Applied to the Performance Analysis of Wireless Communication Systems Estimation

Arxiv

0+阅读 · 2022年1月4日

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Arxiv

4+阅读 · 2020年3月6日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

微信扫码咨询专知VIP会员