In the context of mobile sensing environments, various sensors on mobile devices continually generate a vast amount of data. Analyzing this ever-increasing data presents several challenges, including limited access to annotated data and a constantly changing environment. Recent advancements in self-supervised learning have been utilized as a pre-training step to enhance the performance of conventional supervised models to address the absence of labelled datasets. This research examines the impact of using a self-supervised representation learning model for time series classification tasks in which data is incrementally available. We proposed and evaluated a workflow in which a model learns to extract informative features using a corpus of unlabeled time series data and then conducts classification on labelled data using features extracted by the model. We analyzed the effect of varying the size, distribution, and source of the unlabeled data on the final classification performance across four public datasets, including various types of sensors in diverse applications.
翻译:在移动传感环境下,移动设备上的各种传感器不断生成大量数据。分析这种不断增长的数据存在多个挑战,包括有限的标注数据访问和环境的不断变化。自监督学习的最新进展被用作预训练步骤,以增强传统监督模型的性能,以解决标注数据集缺失的问题。本研究研究了在可增量使用数据的时间序列分类任务中使用自监督表示学习模型的影响。我们提出并评估了一种工作流,其中一个模型学习使用一组未标记的时间序列数据提取信息特征,然后使用模型提取的特征对标记数据进行分类。我们分析了在包括各种类型的传感器在不同应用方面的四个公共数据集中,改变未标记数据的大小、分布和来源对最终分类性能的影响。