Time-series classification is one of the most frequently performed tasks in industrial data science, and one of the most widely used data representation in the industrial setting is tabular representation. In this work, we propose a novel scalable architecture for learning representations from tabular time-series data and subsequently performing downstream tasks such as time-series classification. The representation learning framework is end-to-end, akin to bidirectional encoder representations from transformers (BERT) in language modeling, however, we introduce novel masking technique suitable for pretraining of time-series data. Additionally, we also use one-dimensional convolutions augmented with transformers and explore their effectiveness, since the time-series datasets lend themselves naturally for one-dimensional convolutions. We also propose a novel timestamp embedding technique, which helps in handling both periodic cycles at different time granularity levels, and aperiodic trends present in the time-series data. Our proposed model is end-to-end and can handle both categorical and continuous valued inputs, and does not require any quantization or encoding of continuous features.
翻译:时间序列分类是工业数据科学中最经常执行的任务之一,在工业环境中最广泛使用的数据代表形式之一是表格式。在这项工作中,我们提议了一个新的可扩缩的结构,用于从表式时间序列数据中学习表述,并随后执行时间序列分类等下游任务。代表性学习框架是端到端,类似于变压器(BERT)在语言模型中的双向编码显示。然而,我们引入了适合时间序列数据预培训的新式掩码技术。此外,我们还使用与变压器相加的一维拼图,并探索其有效性,因为时间序列数据集自然适合一维共变。我们还提议了一个新式的时间序列嵌入技术,帮助在不同时间微粒级处理周期,以及时间序列数据中存在的周期趋势。我们提议的模型是端对端,可以处理直截和持续估值的投入,不需要连续特征的任何量化或编码。