Despite accounting for 96.1% of all businesses in Malaysia, access to financing remains one of the most persistent challenges faced by Micro, Small, and Medium Enterprises (MSMEs). Newly established or young businesses are often excluded from formal credit markets as traditional underwriting approaches rely heavily on credit bureau data. This study investigates the potential of bank statement data as an alternative data source for credit assessment to promote financial inclusion in emerging markets. Firstly, we propose a cash flow-based underwriting pipeline where we utilise bank statement data for end to end data extraction and machine learning credit scoring. Secondly, we introduce a novel dataset of 611 loan applicants from a Malaysian lending institution. Thirdly, we develop and evaluate credit scoring models based on application information and bank transaction-derived features. Empirical results show that the use of such data boosts the performance of all models on our dataset, which can improve credit scoring for new-to-lending MSMEs. Lastly, we intend to release the anonymised bank transaction dataset to facilitate further research on MSMEs financial inclusion within Malaysia's emerging economy.
翻译:尽管马来西亚的中小微企业(MSMEs)占企业总数的96.1%,但融资渠道仍是其面临的最持久挑战之一。新成立或年轻企业常被排除在正规信贷市场之外,因为传统承保方法严重依赖征信机构数据。本研究探讨了银行流水数据作为替代数据源用于信贷评估、以促进新兴市场金融包容性的潜力。首先,我们提出一个基于现金流的承保流程,利用银行流水数据进行端到端数据提取和机器学习信用评分。其次,我们引入一个包含马来西亚贷款机构611名贷款申请人的新颖数据集。第三,我们基于申请信息和银行交易衍生特征开发并评估了信用评分模型。实证结果表明,使用此类数据提升了所有模型在我们数据集上的性能,可改善首次申贷中小微企业的信用评分。最后,我们计划公开经匿名处理的银行交易数据集,以促进针对马来西亚新兴经济体内中小微企业金融包容性的进一步研究。