Time series processing and feature extraction are crucial and time-intensive steps in conventional machine learning pipelines. Existing packages are limited in their real-world applicability, as they cannot cope with irregularly-sampled and asynchronous data. We therefore present $\texttt{tsflex}$, a domain-independent, flexible, and sequence first Python toolkit for processing & feature extraction, that is capable of handling irregularly-sampled sequences with unaligned measurements. This toolkit is sequence first as (1) sequence based arguments are leveraged for strided-window feature extraction, and (2) the sequence-index is maintained through all supported operations. $\texttt{tsflex}$ is flexible as it natively supports (1) multivariate time series, (2) multiple window-stride configurations, and (3) integrates with processing and feature functions from other packages, while (4) making no assumptions about the data sampling rate regularity and synchronization. Other functionalities from this package are multiprocessing, in-depth execution time logging, support for categorical & time based data, chunking sequences, and embedded serialization. $\texttt{tsflex}$ is developed to enable fast and memory-efficient time series processing & feature extraction. Results indicate that $\texttt{tsflex}$ is more flexible than similar packages while outperforming these toolkits in both runtime and memory usage.
翻译:时间序列处理和特征提取是常规机器学习管道中关键和时间密集的步骤。 现有包件在实际应用方面有限, 因为它们无法应对非常规抽样和非同步的数据。 因此, 我们提出$\ texttt{ tsflex}$, 是一个域独立、 灵活和排序的第一个 Python 工具箱, 用于处理和特征提取, 能够以不匹配的测量方式处理不规则抽样的序列。 这个工具包是第一个序列, 因为 (1) 以序列为基础的参数被利用, 用于扭曲的窗口特征提取, (2) 序列索引通过所有支持的操作加以维护。 ${ textt{ tsflex} 美元是灵活的, 因为它本地支持:(1) 多变时间序列, (2) 多窗口- 配置, 以及(3) 与其他包件的处理和特征整合, 而 (4) 没有对数据取样率的正常性和同步性进行假设。 这个包的其他功能是多处理、 深度执行时间记录、 支持基于直线和时间的数据、 键化序列和嵌入的序列处理。 美元\\ text- listryaltial- livestial- prial- maxxxxxxxxxxxxxxxx