理解时序表格和多变量时间序列的模型复杂度：以Numerai数据科学竞赛为例的案例研究 (Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament) - 专知论文

会员服务 ·

0

时间序列 · 序列 · 序列建模 · 时序 · 多变量 ·

2023 年 4 月 19 日

Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament

翻译：理解时序表格和多变量时间序列的模型复杂度：以Numerai数据科学竞赛为例的案例研究

Thomas Wong,Mauricio Barahona

In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages

翻译：在本文中，我们探究了多元时间序列建模中不同特征工程和降维方法的使用。我们使用了从Numerai竞赛创建的特征-目标交叉相关时间序列数据集，在过度参数化的制度下，我们证明了不同特征工程方法的性能和预测会收敛到相同的平衡态，可以由再生内积希尔伯特空间来描述。我们提出了一个新的集合方法，该方法结合了不同的随机非线性变换，然后使用岭回归进行高维时间序列建模。与某些常用的序列建模深度学习模型（如LSTM和transformers）相比，我们的方法更加健壮（在不同的随机种子和架构选择之间具有较低的模型方差）和更加高效。我们方法的另一个优点是模型的简单性，因为不需要使用复杂的深度学习框架，如PyTorch。学习到的特征排名然后被应用于Numerai竞赛中的时序表格预测问题，我们方法得到的特征排名的预测能力优于基于移动平均的基线预测模型。

0

相关内容

时间序列

时间序列（或称动态数列）是指将同一统计指标的数值按其发生的时间先后顺序排列而成的数列。时间序列分析的主要目的是根据已有的历史数据对未来进行预测。经济数据中大多数以时间序列的形式给出。根据观察时间的不同，时间序列中的时间可以是年份、季度、月份或其他任何时间形式。

【2023新书】随机模型基础，815页pdf

【2023新书】随机模型基础，815页pdf

专知会员服务

104+阅读 · 2023年5月10日

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

【干货书】深度学习数学：理解神经网络，347页pdf

【干货书】深度学习数学：理解神经网络，347页pdf

专知会员服务

267+阅读 · 2022年7月3日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【经典书】机器学习：基础、算法与应用，593页pdf，图文并茂带你掌握机器学习

【经典书】机器学习：基础、算法与应用，593页pdf，图文并茂带你掌握机器学习

专知会员服务

128+阅读 · 2022年5月25日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知

22+阅读 · 2020年2月26日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

Underlay频谱共享方式下信号参数估计和调制识别的方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

半参数回归模型中随机误差分布的检验问题

国家自然科学基金

2+阅读 · 2015年12月31日

非均质量子器件Schr？dinger-Poisson系统多尺度分析与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

随机右删失数据半参数回归模型的光滑FIC平均估计理论及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

局部波动特征分解(LOD)方法及其在机械故障诊断中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Realized GARCH框架的波动率和相关性模型理论和应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

金融连续时间随机过程的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

Numbl-TRAF6-TAB2对NF-kappa B活性的调节在小胶质细胞炎性活化中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

Time Interpret: a Unified Model Interpretability Library for Time Series

Arxiv

0+阅读 · 2023年6月5日

Data Quality in Imitation Learning

Arxiv

0+阅读 · 2023年6月4日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年6月3日

Robust Collaborative Learning with Linear Gradient Overhead

Arxiv

0+阅读 · 2023年6月3日

Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

Arxiv

0+阅读 · 2023年6月2日

ThinkSum: Probabilistic reasoning over sets using large language models

Arxiv

0+阅读 · 2023年6月2日

Numerical Investigation of the Fractional Oscillation Equations under the Context of Variable Order Caputo Fractional Derivative via Fractional Order Bernstein Wavelets

Arxiv

0+阅读 · 2023年6月1日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月1日

Learning Sampling Dictionaries for Efficient and Generalizable Robot Motion Planning with Transformers

Arxiv

0+阅读 · 2023年6月1日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

VIP会员

文章信息

相关主题

相关VIP内容

【2023新书】随机模型基础，815页pdf

【2023新书】随机模型基础，815页pdf

专知会员服务

104+阅读 · 2023年5月10日

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

干货书！基于单调算子的大规模凸优化，348页pdf

干货书！基于单调算子的大规模凸优化，348页pdf

专知会员服务

50+阅读 · 2022年7月24日

【干货书】深度学习数学：理解神经网络，347页pdf

【干货书】深度学习数学：理解神经网络，347页pdf

专知会员服务

267+阅读 · 2022年7月3日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【经典书】机器学习：基础、算法与应用，593页pdf，图文并茂带你掌握机器学习

【经典书】机器学习：基础、算法与应用，593页pdf，图文并茂带你掌握机器学习

专知会员服务

128+阅读 · 2022年5月25日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知

22+阅读 · 2020年2月26日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

相关论文

Time Interpret: a Unified Model Interpretability Library for Time Series

Arxiv

0+阅读 · 2023年6月5日

Data Quality in Imitation Learning

Arxiv

0+阅读 · 2023年6月4日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年6月3日

Robust Collaborative Learning with Linear Gradient Overhead

Arxiv

0+阅读 · 2023年6月3日

Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

Arxiv

0+阅读 · 2023年6月2日

ThinkSum: Probabilistic reasoning over sets using large language models

Arxiv

0+阅读 · 2023年6月2日

Numerical Investigation of the Fractional Oscillation Equations under the Context of Variable Order Caputo Fractional Derivative via Fractional Order Bernstein Wavelets

Arxiv

0+阅读 · 2023年6月1日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月1日

Learning Sampling Dictionaries for Efficient and Generalizable Robot Motion Planning with Transformers

Arxiv

0+阅读 · 2023年6月1日

One for All: Neural Joint Modeling of Entities and Events

Arxiv

11+阅读 · 2018年12月1日

相关基金

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

Underlay频谱共享方式下信号参数估计和调制识别的方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

半参数回归模型中随机误差分布的检验问题

国家自然科学基金

2+阅读 · 2015年12月31日

非均质量子器件Schr？dinger-Poisson系统多尺度分析与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

随机右删失数据半参数回归模型的光滑FIC平均估计理论及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

局部波动特征分解(LOD)方法及其在机械故障诊断中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Realized GARCH框架的波动率和相关性模型理论和应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

金融连续时间随机过程的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

Numbl-TRAF6-TAB2对NF-kappa B活性的调节在小胶质细胞炎性活化中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员