Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their applicability, as they cannot cope with irregularly-sampled or asynchronous data and make strong assumptions about the data format. Moreover, these packages do not focus on execution speed and memory efficiency, resulting in considerable overhead. We present $\texttt{tsflex}$, a Python toolkit for time series processing and feature extraction, that focuses on performance and flexibility, enabling broad applicability. This toolkit leverages window-stride arguments of the same data type as the sequence-index, and maintains the sequence-index through all operations. $\texttt{tsflex}$ is flexible as it supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling regularity, series alignment, and data type. Other functionalities include multiprocessing, detailed execution logging, chunking sequences, and serialization. Benchmarks show that $\texttt{tsflex}$ is faster and more memory-efficient compared to similar packages, while being more permissive and flexible in its utilization.
翻译:常规机器学习管道中的时间序列处理和特征提取是关键和时间密集的关键步骤,传统机器学习管道中的时间序列处理和特征提取是关键和时间密集的步骤。现有的包件在适用性方面是有限的,因为它们无法应对非常规抽样或非同步的数据,对数据格式作出强烈的假设。此外,这些包件并不侧重于执行速度和记忆效率,从而导致相当大的间接费用。我们提供了用于时间序列处理和特征提取的Python工具包$\texttt{tsflex},这是一个用于时间序列处理和特征提取的Python工具包,侧重于性能和灵活性,能够广泛适用性。这个工具包利用了与序列索引相同的数据类型的窗口-轮廓参数,并且通过所有操作维持序列索引。$\ textt{tsflex}$是灵活的,因为它支持:(1) 多变时间序列,(2) 多窗口-轮廓配置和(3) 与其他包件的处理和特征功能整合,而对于数据取样的规律性、序列调整和数据类型没有假设。其他功能包括多处理、详细执行记录、组合序列序列和序列化。基准显示,$texttortt{t{tsflexlexlexlexlexxxxxxxxxxx