In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages
翻译:在本文中,我们探究了多元时间序列建模中不同特征工程和降维方法的使用。我们使用了从Numerai竞赛创建的特征-目标交叉相关时间序列数据集,在过度参数化的制度下,我们证明了不同特征工程方法的性能和预测会收敛到相同的平衡态,可以由再生内积希尔伯特空间来描述。我们提出了一个新的集合方法,该方法结合了不同的随机非线性变换,然后使用岭回归进行高维时间序列建模。与某些常用的序列建模深度学习模型(如LSTM和transformers)相比,我们的方法更加健壮(在不同的随机种子和架构选择之间具有较低的模型方差)和更加高效。我们方法的另一个优点是模型的简单性,因为不需要使用复杂的深度学习框架,如PyTorch。学习到的特征排名然后被应用于Numerai竞赛中的时序表格预测问题,我们方法得到的特征排名的预测能力优于基于移动平均的基线预测模型。