We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows that introduce stronger non- stationarity. We resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Our theoretical analysis demonstrates that this approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight. Applying our method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample $R^2$ by 14-23% on average. During NBER- designated recessions, improvements are substantial: our method achieves positive $R^2$ during the Gulf War recession while benchmarks are negative, and improves $R^2$ in absolute terms by at least 80bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on our selected model generates 31% higher cumulative returns averaged across the industries.
翻译:本研究探讨非平稳环境下股票收益预测的机器学习模型,揭示了一个根本性的非平稳性-复杂度权衡:复杂模型虽能降低设定误差,但需要更长的训练窗口,而这会引入更强的非平稳性。我们通过一种新颖的模型选择方法解决了这一矛盾,该方法采用锦标赛程序在非平稳验证数据上自适应评估候选模型,联合优化模型类别与训练窗口长度。理论分析表明,该方法能平衡设定误差、估计方差与非平稳性,其表现接近事后最优模型。将本方法应用于17个行业投资组合收益预测时,其表现持续优于标准的滚动窗口基准模型,样本外$R^2$平均提升14-23%。在美国国家经济研究局认定的经济衰退期间改进尤为显著:海湾战争衰退期间基准模型$R^2$为负值时本方法仍能保持正值;2001年衰退期间$R^2$绝对值提升至少80个基点;2008年金融危机期间亦呈现更优性能。经济意义层面,基于所选模型的交易策略在各行业平均累计收益提升31%。