In this paper, we present a robust deep incremental learning model for regression tasks on financial temporal tabular datasets. Using commonly available tabular and time-series prediction models as building blocks, a machine-learning model is built incrementally to adapt to distributional shifts in data. Using the concept of self-similarity, the model uses only a basic building block of machine learning methods, decision trees to build models of any required complexity. The model is demonstrated to have robust performances under adverse situations such as regime changes, fat-tailed distributions and low signal-to-noise ratios which is common in financial datasets. Model robustness are studied under different hyper-parameters such as model complexity and data sampling settings using XGBoost models trained on the Numerai dataset as a detailed case study. The two layer deep ensemble of XGBoost models over different model snapshots is demonstrated to deliver high quality predictions under different market regimes. Comparing the XGBoost models with different number of boosting rounds in three scenarios (small, standard and large), we demonstrated the model performances are monotonic increasing with respect to model sizes and converges towards the generalisation upper bound. Our model is efficient with much lower hardware requirement than other machine learning models as no specialised neural architectures are used and each base model can be independently trained in parallel.
翻译:暂无翻译