A major bottleneck in training robust Human-Activity Recognition models (HAR) is the need for large-scale labeled sensor datasets. Because labeling large amounts of sensor data is an expensive task, unsupervised and semi-supervised learning techniques have emerged that can learn good features from the data without requiring any labels. In this paper, we extend this line of research and present a novel technique called Collaborative Self-Supervised Learning (ColloSSL) which leverages unlabeled data collected from multiple devices worn by a user to learn high-quality features of the data. A key insight that underpins the design of ColloSSL is that unlabeled sensor datasets simultaneously captured by multiple devices can be viewed as natural transformations of each other, and leveraged to generate a supervisory signal for representation learning. We present three technical innovations to extend conventional self-supervised learning algorithms to a multi-device setting: a Device Selection approach which selects positive and negative devices to enable contrastive learning, a Contrastive Sampling algorithm which samples positive and negative examples in a multi-device setting, and a loss function called Multi-view Contrastive Loss which extends standard contrastive loss to a multi-device setting. Our experimental results on three multi-device datasets show that ColloSSL outperforms both fully-supervised and semi-supervised learning techniques in majority of the experiment settings, resulting in an absolute increase of upto 7.9% in F_1 score compared to the best performing baselines. We also show that ColloSSL outperforms the fully-supervised methods in a low-data regime, by just using one-tenth of the available labeled data in the best case.
翻译:在培训强健的人类活动识别模型(HAR)中,一个主要的瓶颈是需要大规模标记的传感器数据集。因为贴上大量传感器数据标签是一项昂贵的任务,因此出现了一些未经监管和半监督的学习技术,这些技术可以从数据中学习良好的特征,而不需要任何标签。在本文中,我们扩展了这一研究线,并展示了一种新型技术,称为“协作自我监督学习”(ColloSSL),它利用从用户所利用的多种设备收集的无标签数据学习数据的高质量数据。ColloSSL设计的主要洞察力是,由多个设备同时捕获的无标签的传感器数据集可以被视为彼此的自然变换,并被用来生成一个用于代表学习的监督信号。我们提出了三种技术创新,将传统的自我监督学习算法推广到一个多设备设置:一种设备选择正和负的装置,以便能够进行对比性学习。一种对比性采集算法,一种用于在多版本的服务器系统设置中采集正反向和反向反向示例,在多版本的服务器设置中将结果转换为我们的实验性数据测试结果,在测试中,在测试中,在测试中将自动显示一个测试中显示一个测试性数据结果,在测试中,在测试中显示一个测试中显示一个测试中显示一个测试结果中,在三个数据结果中显示一个测试结果,以全面的数据结果显示一个测试结果。