The aim of this paper is to develop and test advanced analytical methods to improve the prediction accuracy of Credit Risk Models, preserving at the same time the model interpretability. In particular, the project focuses on applying an explainable machine learning model to PSD2-related databases. The input data were obtained solely from synthetic account transactions generated from a pool of commercial banks from a pool of Italian commercial banks. Over the total proven models, CatBoost has shown the highest performance. The algorithm implementation produces a GINI of 0.45 after tuning the hyper-parameters combined with their inherent class-weight resampling method. SHAP package is used to provide a global and local interpretation of the model predictions to formulate a human-comprehensive approach to understanding the decision-maker algorithm. The 20 most important features are selected using the Shapley values to present a full human-understandable model that reveals how the attributes of an individual are related to its model prediction.
翻译:本文的目的是开发和测试先进的分析方法,以提高信用风险模型的预测准确性,同时保留模型的解释性,特别是,该项目侧重于将一个可解释的机器学习模型应用于与私营部门司有关的数据库;输入数据完全来自意大利商业银行集合的商业银行的合成账户交易;在已经证明的全部模型中,CatBoost表现出了最高性能;算法的实施在调整超参数及其固有的分类重量抽取方法之后,产生了0.45的GNI。 SHAP软件包用于为模型预测提供全球和地方的解释,以制定理解决策者算法的人类综合方法。选择了20个最重要的特征,使用Sapley 值来展示一个完全可理解的模型,显示一个人的属性如何与其模型预测相关。