小企业违约预测文本的价值:深层学习方法 (The value of text for small business default prediction: A deep learning approach)

Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. Therefore, it is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60 000 textual assessments provided by a lender. We consider the performance in terms of the AUC (Area Under the receiver operating characteristic Curve) and Brier Score metrics and find that the text alone is surprisingly effective for predicting default. However, when combined with traditional data, it yields no additional predictive capability, with performance dependent on the text's length. Our proposed deep learning model does, however, appear to be robust to the quality of the text and therefore suitable for partly automating the mSME lending process. We also demonstrate how the content of loan assessments influences performance, leading us to a series of recommendations on a new strategy for collecting future mSME loan assessments.

翻译：与消费者贷款相比,微型、小型和中型企业信用风险建模尤其具有挑战性,因为往往没有相同的信息来源,因此,贷款干事的标准政策是提供文字贷款评估,以减少有限的数据可用性;反过来,该声明由信贷专家与任何现有的标准信用数据进行分析;在我们的文件中,我们利用深层次学习和自然语言处理领域的最新进展,包括BERT(来自变换器的双向编码代表)模型,从放款人提供的60 000份文本评估中提取信息。我们从AUC(在接收者操作特点Curve下的Area)和Brier分数衡量标准的角度考虑AUC(在接收者操作特点Curve下的Area)和Brier分数衡量标准,发现单靠该文本对预测违约是惊人的。然而,如果与传统数据相结合,它不会产生额外的预测能力,业绩取决于文本的长度。然而,我们提议的深层次学习模式似乎对文本的质量很健全,因此适合部分地将MME贷款进程自动化。我们还演示了新评估战略的内容如何影响我们未来的评估。