Research on time series forecasting has predominantly focused on developing methods that improve accuracy. However, other criteria such as training time or latency are critical in many real-world applications. We therefore address the question of how to choose an appropriate forecasting model for a given dataset among the plethora of available forecasting methods when accuracy is only one of many criteria. For this, our contributions are two-fold. First, we present a comprehensive benchmark, evaluating 7 classical and 6 deep learning forecasting methods on 44 heterogeneous, publicly available datasets. The benchmark code is open-sourced along with evaluations and forecasts for all methods. These evaluations enable us to answer open questions such as the amount of data required for deep learning models to outperform classical ones. Second, we leverage the benchmark evaluations to learn good defaults that consider multiple objectives such as accuracy and latency. By learning a mapping from forecasting models to performance metrics, we show that our method PARETOSELECT is able to accurately select models from the Pareto front -- alleviating the need to train or evaluate many forecasting models for model selection. To the best of our knowledge, PARETOSELECT constitutes the first method to learn default models in a multi-objective setting.
翻译:时间序列预测研究主要侧重于制定提高准确性的方法,然而,培训时间或长期性等其他标准在许多现实世界应用中至关重要,因此,我们解决了如何在准确性只是许多标准之一的情况下,在大量可用预测方法中选择一个特定数据集的适当预测模型的问题。在这方面,我们的贡献是双重的。首先,我们提出了一个综合基准,对44种差异、公开可得的数据集的7种古典和6种深层学习预测方法进行了评估。基准代码与所有方法的评价和预测一起公开提供。这些评估使我们能够回答一些开放的问题,如深层学习模型所需的数据数量,以优于古典模型。第二,我们利用基准评估来学习考虑多种目标的良好默认数据,如准确性和耐久性。我们从预测模型到性度尺度的绘图表明,我们的方法PARETOLECT能够准确地从Pareto前线选择模型 -- -- 减轻培训或评价许多模型用于模型选择模型的需要。为了最佳的知识,PARETOSELECT构成了在多目标模型中学习默认模型的第一种方法。