Credit scoring models based on accepted applications may be biased and their consequences can have a statistical and economic impact. Reject inference is the process of attempting to infer the creditworthiness status of the rejected applications. In this research, we use deep generative models to develop two new semi-supervised Bayesian models for reject inference in credit scoring, in which we model the data generating process to be dependent on a Gaussian mixture. The goal is to improve the classification accuracy in credit scoring models by adding reject applications. Our proposed models infer the unknown creditworthiness of the rejected applications by exact enumeration of the two possible outcomes of the loan (default or non-default). The efficient stochastic gradient optimization technique used in deep generative models makes our models suitable for large data sets. Finally, the experiments in this research show that our proposed models perform better than classical and alternative machine learning models for reject inference in credit scoring.
翻译:基于公认应用的信用评分模式可能存在偏向,其后果可能产生统计和经济影响。拒绝推论是试图推断被拒应用的信用程度的过程。在这项研究中,我们使用深基因模型开发两种新的半监督的贝叶斯人模式,用以拒绝信用评分中的推论,我们用这种模式来模拟数据生成过程,使之依赖于高斯混合体。目标是通过添加拒绝应用来提高信用评分模式的分类准确性。我们提议的模型通过精确列举贷款的两个可能结果(违约或不违约)推断被拒应用的未知信用程度。深基因模型中使用的高效随机梯度优化技术使我们的模型适合大型数据集。最后,这项研究的实验表明,我们提议的模型比传统和替代机器学习模型更适合在信用评分中拒绝推理。