This paper presents a systematic review of Python packages with a focus on time series analysis. The objective is to provide (1) an overview of the different time series analysis tasks and preprocessing methods implemented, and (2) an overview of the development characteristics of the packages (e.g., documentation, dependencies, and community size). This review is based on a search of literature databases as well as GitHub repositories. Following the filtering process, 40 packages were analyzed. We classified the packages according to the analysis tasks implemented, the methods related to data preparation, and the means for evaluating the results produced (methods and access to evaluation data). We also reviewed documentation aspects, the licenses, the size of the packages' community, and the dependencies used. Among other things, our results show that forecasting is by far the most frequently implemented task, that half of the packages provide access to real datasets or allow generating synthetic data, and that many packages depend on a few libraries (the most used ones being numpy, scipy and pandas). We hope that this review can help practitioners and researchers navigate the space of Python packages dedicated to time series analysis. We will provide an updated list of the reviewed packages online at https://siebert-julien.github.io/time-series-analysis-python/.
翻译:本文介绍对Python软件包的系统审查,重点是时间序列分析。目的是提供:(1) 对不同时间序列分析任务和所执行的预处理方法的概览,以及(2) 对软件包的发展特点的概览(例如文件、依赖性和社区大小)。这一审查基于对文献数据库和GitHub 库的搜索。在过滤程序之后,分析了40个软件包。我们根据所执行的分析任务、与数据编制有关的方法以及评价所产生结果的手段(评估数据的方法和获取)对软件包进行了分类。我们还审查了文件方面、许可证、软件包群的规模以及所使用的依赖性。除其他之外,我们的结果显示,预报是最经常执行的任务,一半的软件包提供实际数据集的接入,或能够生成合成数据。许多软件包依赖少数图书馆(最常用的图书馆是简易的、粘附的和派达斯) 。我们希望这一审查能够帮助从业人员和研究人员在Python软件包的空间上浏览Pythón软件包/pli 专用于进行时间序列分析。我们将会提供一份在线分析的在线清单。我们将会提供一份用于进行时间序列分析的在线分析。