Financial exclusion constrains entrepreneurship, increases income volatility, and widens wealth gaps. Underbanked consumers in Istanbul often have no bureau file because their earnings and payments flow through informal channels. To study how such borrowers can be evaluated we create a synthetic dataset of one hundred thousand Istanbul residents that reproduces first quarter 2025 TÜİK census marginals and telecom usage patterns. Retrieval augmented generation feeds these public statistics into the OpenAI o3 model, which synthesises realistic yet private records. Each profile contains seven socio demographic variables and nine alternative attributes that describe phone specifications, online shopping rhythm, subscription spend, car ownership, monthly rent, and a credit card flag. To test the impact of the alternative financial data CatBoost, LightGBM, and XGBoost are each trained in two versions. Demo models use only the socio demographic variables; Full models include both socio demographic and alternative attributes. Across five fold stratified validation the alternative block raises area under the curve by about one point three percentage and lifts balanced \(F_{1}\) from roughly 0.84 to 0.95, a fourteen percent gain. We contribute an open Istanbul 2025 Q1 synthetic dataset, a fully reproducible modeling pipeline, and empirical evidence that a concise set of behavioural attributes can approach bureau level discrimination power while serving borrowers who lack formal credit records. These findings give lenders and regulators a transparent blueprint for extending fair and safe credit access to the underbanked.
翻译:金融排斥限制了创业活动,增加了收入波动性,并扩大了财富差距。伊斯坦布尔地区金融服务不足的消费者往往缺乏征信记录,因为其收入和支付通常通过非正规渠道进行。为研究如何评估此类借款人,我们创建了一个包含十万名伊斯坦布尔居民的合成数据集,该数据集复现了土耳其统计局2025年第一季度人口普查边际分布与电信使用模式。检索增强生成技术将这些公开统计数据输入OpenAI o3模型,合成了真实但隐私安全的记录。每个档案包含七个社会人口学变量和九个替代特征,分别描述手机规格、在线购物频率、订阅支出、汽车保有状况、月租金及信用卡持有标志。为检验替代性金融数据的影响,我们分别训练了CatBoost、LightGBM和XGBoost的两种版本:演示模型仅使用社会人口学变量,完整模型则同时包含社会人口学与替代特征。通过五折分层验证,替代特征模块将曲线下面积提升了约1.3个百分点,并将平衡\(F_{1}\)分数从0.84提升至0.95,实现了14%的增益。本研究贡献包括:开放的伊斯坦布尔2025年第一季度合成数据集、完全可复现的建模流程,以及实证证据表明简洁的行为特征集合能在服务缺乏正规信用记录的借款人时,达到接近征信级别的区分能力。这些发现为贷款机构和监管机构提供了向金融服务不足群体扩展公平、安全信贷服务的透明蓝图。