In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages
翻译:在本文中,我们探索了多种变换时间序列模型中不同特征工程和维度减少方法的使用。我们利用由Numarai比赛创建的地标目标交叉相关时间序列数据集,在超分化制度下演示了不同特征工程方法的性能和预测,这些性能和预测都趋同于同一平衡,这可以通过复制内核Hilbert空间来定性。我们建议了一种新的共集法,它结合了不同的随机非线性变异,然后是高维时间序列模型的脊柱回归。与LSTM和变异器等一些常用的序列建模深学习模型相比,我们的方法更为健全(不同随机种子的低模型差异和对结构选择不那么敏感),而且效率更高。我们方法的另一个优点是模型简单,因为不需要使用精密的深学习框架,例如PyTorch。随后,对Numerai 比赛中的时空表预测问题应用了所学的地位排序,而从我们方法中获得的地貌排序的预测力比基于移动平均预测模型的基线预测模型要好得多。</s>