适用于小企业贷款风险评估的具有合成特征的CatBoost模型 (CatBoost model with synthetic features in application to loan risk assessment of small businesses) - 专知论文

会员服务 ·

0

合成特征 · AUC · 模型评估 · 数据集 · MoDELS ·

2021 年 6 月 30 日

CatBoost model with synthetic features in application to loan risk assessment of small businesses

翻译：适用于小企业贷款风险评估的具有合成特征的CatBoost模型

Haoxue Wang,Liexin Cheng

Loan risk for small businesses has long been a complex problem worthy of exploring. Predicting the loan risk can benefit entrepreneurship by developing more jobs for the society. CatBoost (Categorical Boosting) is a powerful machine learning algorithm suitable for dataset with many categorical variables like the dataset for forecasting loan risk. In this paper, we identify the important risk factors that contribute to loan status classification problem. Then we compare the performance between boosting-type algorithms(especially CatBoost) with other traditional yet popular ones. The dataset we adopt in the research comes from the U.S. Small Business Administration (SBA) and holds a very large sample size (899,164 observations and 27 features). In order to make the best use of the important features in the dataset, we propose a technique named "synthetic generation" to develop more combined features based on arithmetic operation, which ends up improving the accuracy and AUC of the original CatBoost model. We obtain a high accuracy of 95.84% and well-performed AUC of 98.80% compared with the existent literature of related research.

翻译：对小企业的贷款风险长期以来一直是值得探讨的一个复杂问题。预测贷款风险可以通过为社会创造更多就业机会而使创业受益。 Catboost(Catboost)是一个强大的机器学习算法,适合用诸如预测贷款风险的数据集等许多绝对变量建立数据集。在本文中,我们确定了导致贷款地位分类问题的重要风险因素。然后我们将刺激型算法(特别是CatBoost)与其他传统和流行型算法的性能进行比较。我们在研究中采用的数据集来自美国小企业管理局(SAB),具有非常庞大的样本规模(899,164个观察和27个特征 ) 。为了最好地利用数据集中的重要特征,我们提出了一个名为“合成一代”的技术,以根据算术操作开发更多综合特征,从而最终提高原CatBoost模型的准确性和AUC。我们获得了95.84%的高精度,而完善的ACC为98.80%,与相关研究的现有文献相比,我们获得了98.80%的精度。

0

相关内容

合成特征

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【WSDM2020】小数据学习，124页ppt，Learning with Small Data，宾夕法尼亚州立大学

【WSDM2020】小数据学习，124页ppt，Learning with Small Data，宾夕法尼亚州立大学

专知会员服务

137+阅读 · 2020年2月6日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

盘一盘 Python 系列 8 - Sklearn

盘一盘 Python 系列 8 - Sklearn

平均机器

5+阅读 · 2019年5月30日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Accuracy analysis of Educational Data Mining using Feature Selection Algorithm

Arxiv

0+阅读 · 2021年9月1日

An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification

Arxiv

0+阅读 · 2021年9月1日

Learning Bayesian Networks from Ordinal Data

Arxiv

0+阅读 · 2021年8月31日

Ensemble Methods for Survival Data with Time-Varying Covariates

Arxiv

0+阅读 · 2021年8月30日

Survival Analysis with Graph-Based Regularization for Predictors

Arxiv

0+阅读 · 2021年8月29日

LassoLayer: Nonlinear Feature Selection by Switching One-to-one Links

Arxiv

0+阅读 · 2021年8月27日

Identifying Non-Control Security-Critical Data in Program Binaries with a Deep Neural Model

Arxiv

0+阅读 · 2021年8月27日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Arxiv

4+阅读 · 2017年11月27日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【WSDM2020】小数据学习，124页ppt，Learning with Small Data，宾夕法尼亚州立大学

【WSDM2020】小数据学习，124页ppt，Learning with Small Data，宾夕法尼亚州立大学

专知会员服务

137+阅读 · 2020年2月6日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

盘一盘 Python 系列 8 - Sklearn

盘一盘 Python 系列 8 - Sklearn

平均机器

5+阅读 · 2019年5月30日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Accuracy analysis of Educational Data Mining using Feature Selection Algorithm

Arxiv

0+阅读 · 2021年9月1日

An Empirical Study on the Joint Impact of Feature Selection and Data Resampling on Imbalance Classification

Arxiv

0+阅读 · 2021年9月1日

Learning Bayesian Networks from Ordinal Data

Arxiv

0+阅读 · 2021年8月31日

Ensemble Methods for Survival Data with Time-Varying Covariates

Arxiv

0+阅读 · 2021年8月30日

Survival Analysis with Graph-Based Regularization for Predictors

Arxiv

0+阅读 · 2021年8月29日

LassoLayer: Nonlinear Feature Selection by Switching One-to-one Links

Arxiv

0+阅读 · 2021年8月27日

Identifying Non-Control Security-Critical Data in Program Binaries with a Deep Neural Model

Arxiv

0+阅读 · 2021年8月27日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Automatic multi-objective based feature selection for classification

Automatic multi-objective based feature selection for classification

Arxiv

6+阅读 · 2018年7月9日

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Arxiv

4+阅读 · 2017年11月27日

微信扫码咨询专知VIP会员