The application of deep learning algorithms to financial data is difficult due to heavy non-stationarities which can lead to over-fitted models that underperform under regime changes. Using the Numerai tournament data set as a motivating example, we propose a machine learning pipeline for trading market-neutral stock portfolios based on tabular data which is robust under changes in market conditions. We evaluate various machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering, as the building blocks for the pipeline. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in volatile market conditions. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.
翻译:在金融数据中应用深层次学习算法是困难的,因为大量非静止,可能导致在政权更替下表现不佳的模式过于完善。我们利用Numerai锦标赛数据集作为激励性范例,提议以表格数据为基础,为交易市场中立的股票组合提供机器学习管道,该流程在市场条件变化中是稳健的。我们评价各种机器学习模式,包括 " 渐进促进决策树(GBDTs) " 和 " 神经网络 ",作为管道的构件,无论是否具有简单特征工程。我们发现,辍学的GBDT模型显示高性能、稳健性和通用性,且相对较低的复杂性和较低的计算成本。我们然后显示,在线学习技术可用于定位后处理,以提高结果。特别是,动态特征中和高效程序,不需要对模型进行再培训,而且可以对任何机器学习模式应用后定位,通过减少不稳定市场条件的缩放,提高稳健性。此外,我们证明,根据最近的模型选择,通过动态模型的选取,显示高性、稳健性和通用性模型,导致改进了基准业绩,同时改进了高压的升级的种子和平静率。