Time series are measured and analyzed across the sciences. One method of quantifying the structure of time series is by calculating a set of summary statistics or `features', and then representing a time series in terms of its properties as a feature vector. The resulting feature space is interpretable and informative, and enables conventional statistical learning approaches, including clustering, regression, and classification, to be applied to time-series datasets. Many open-source software packages for computing sets of time-series features exist across multiple programming languages, including catch22 (22 features: Matlab, R, Python, Julia), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (779 features: Python), and TSFEL (390 features: Python). However, there are several issues: (i) a singular access point to these packages is not currently available; (ii) to access all feature sets, users must be fluent in multiple languages; and (iii) these feature-extraction packages lack extensive accompanying methodological pipelines for performing feature-based time-series analysis, such as applications to time-series classification. Here we introduce a solution to these issues in an R software package called theft: Tools for Handling Extraction of Features from Time series. theft is a unified and extendable framework for computing features from the six open-source time-series feature sets listed above. It also includes a suite of functions for processing and interpreting the performance of extracted features, including extensive data-visualization templates, low-dimensional projections, and time-series classification operations. With an increasing volume and complexity of time-series datasets in the sciences and industry, theft provides a standardized framework for comprehensively quantifying and interpreting informative structure in time series.
翻译:测量和分析整个科学的时间序列。 量化时间序列结构的一种方法是计算一套简要统计或“特点”,然后代表一个时间序列,作为特性矢量的属性。 由此产生的特征空间可以解释和提供信息,并使得传统统计学习方法,包括集成、回归和分类,适用于时间序列数据集。 许多用于计算时间序列功能的开放源软件包存在于多种程序语言中,包括第22项(特征):Matlab、R、Python、Julia)、宴会(42个特征:R)、功能(63个特征):Kats(40个特征:Python)、tslush(779个特征:Python)和TSFEL(390个特征:Python)。 然而,有几个问题:(一)目前没有这些软件包的单一接入点;(二)获取所有特征数据集,用户必须具备多种语言的流利度;以及(三)这些功能递增组合缺乏广泛的时间序列处理功能(63个特征:R)、Kats(40个特征:Pyth) (40个功能: Pythreal-lial) mailal lialal lialalalalalalalalalalalalalalalal) oration orisalalalalation orationalal-licuilational asalationalationalational as maisalisalationalizationalviolviolviolviolviolmation,在运行,在运行中,在运行中,在运行中,在运行中进行一个数据序列中进行一个数据序列中进行一个数据序列中,在Sildalisalisal-时间序列分析,在Sil axxismax,在Sil a imal-sildal-时间序列中,在Sil axxxxxxxxxxxxxxervical-,在Servicalxildal-lial-al-lial-lial-liation上进行中提供中提供中提供中提供中,在使用,在使用。