Analyzing better time series with limited human effort is of interest to academia and industry. Driven by business scenarios, we organized the first Automated Time Series Regression challenge (AutoSeries) for the WSDM Cup 2020. We present its design, analysis, and post-hoc experiments. The code submission requirement precluded participants from any manual intervention, testing automated machine learning capabilities of solutions, across many datasets, under hardware and time limitations. We prepared 10 datasets from diverse application domains (sales, power consumption, air quality, traffic, and parking), featuring missing data, mixed continuous and categorical variables, and various sampling rates. Each dataset was split into a training and a test sequence (which was streamed, allowing models to continuously adapt). The setting of time series regression, differs from classical forecasting in that covariates at the present time are known. Great strides were made by participants to tackle this AutoSeries problem, as demonstrated by the jump in performance from the sample submission, and post-hoc comparisons with AutoGluon. Simple yet effective methods were used, based on feature engineering, LightGBM, and random search hyper-parameter tuning, addressing all aspects of the challenge. Our post-hoc analyses revealed that providing additional time did not yield significant improvements. The winners' code was open-sourced https://www.4paradigm.com/competition/autoseries2020.
翻译:以有限的人力努力分析更好的时间序列是学术界和产业界感兴趣的。在商业设想的驱动下,我们为2020年WSDM杯组织了第一个自动时间序列递减挑战(自动系统),我们介绍了其设计、分析和热后实验。代码提交要求使参与者无法在硬件和时间限制下,通过人工干预、测试许多数据集的自动机器解决方案学习能力,在硬件和时间的限制下,通过多种应用领域(销售、电力消耗、空气质量、交通和泊车),从不同的应用领域(销售、电力消耗、空气质量、交通和停车)收集了10个数据集,其中包括缺失的数据、连续和绝对的混合变量以及各种取样率。每个数据集被分成一个培训和测试序列(由流成的,允许模型不断调整)。时间序列的设置不同于目前这种共变式的典型预测。参与者在解决这个AutoSeries问题方面迈出了巨大的步伐,这从样本提交20的绩效和与AutoGluon的后方比较中显示出来。根据地貌工程工程设计、LightGBM/随机搜索的超时序式滚式滚式滚式模型进行了所有的挑战。提供了对结果的分析。