Using a comprehensive sample of 2,585 bankruptcies from 1990 to 2019, we benchmark the performance of various machine learning models in predicting financial distress of publicly traded U.S. firms. We find that gradient boosted trees outperform other models in one-year-ahead forecasts. Variable permutation tests show that excess stock returns, idiosyncratic risk, and relative size are the more important variables for predictions. Textual features derived from corporate filings do not improve performance materially. In a credit competition model that accounts for the asymmetric cost of default misclassification, the survival random forest is able to capture large dollar profits.
翻译:使用1990年至2019年2 585起破产案的综合样本,我们用各种机器学习模型的绩效来预测公开交易的美国公司的财政困境,我们用这些模型作为基准。我们发现,在一年头的预测中,梯度推动树木的生长超过其他模型。 变式调整测试表明,超额股票回报、特异性风险和相对规模是预测中更重要的变量。 公司档案的文字特征并没有实质性地改善业绩。 在计算违约错误分类的不对称成本的信用竞争模型中,生存随机森林能够捕捉大额美元利润。