In recent years, self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) in order to learn deep representations without data annotations. While SSL frameworks reach performance almost comparable to supervised models, studies on interpreting representations learnt by SSL models are limited. Nevertheless, modern explainability methods could help to unravel the differences between SSL and supervised representations: how they are being learnt, what properties of input data they preserve, and when SSL can be chosen over supervised training. In this paper, we aim to analyze deep representations of two recent SSL frameworks, namely SimCLR and VICReg. Specifically, the emphasis is made on (i) comparing the robustness of supervised and SSL models to corruptions in input data; (ii) explaining predictions of deep learning models using saliency maps and highlighting what input channels are mostly used for predicting various activities; (iii) exploring properties encoded in SSL and supervised representations using probing. Extensive experiments on two single-device datasets (MobiAct and UCI-HAR) have shown that self-supervised learning representations are significantly more robust to noise in unseen data compared to supervised models. In contrast, features learnt by the supervised approaches are more homogeneous across subjects and better encode the nature of activities.
翻译:近年来,自监督学习(SSL)框架被广泛用于基于传感器的人体活动识别(HAR),以学习深度表示而无需数据注释。尽管SSL框架的性能几乎与监督模型可比,但是对解释SSL模型学习的表示的研究有限。然而,现代可解释性方法可以帮助揭示SSL和监督表示之间的差异:它们如何学习,它们保留的输入数据属性,以及何时可以选择SSL训练。本文旨在分析最近的两个SSL框架(即SimCLR和VICReg)的深度表示。具体而言,重点是:(i)比较受监督和SSL模型对输入数据污染的鲁棒性;(ii)使用显著性地图解释深度学习模型的预测,并突出显示用于预测各种活动的输入通道;(iii)使用探测探索SSL和监督表示中编码的属性。对两个单设备数据集(MobiAct和UCI-HAR)进行的大量实验表明,与受监督的模型相比,自监督学习表示对未见数据中的噪声更加稳健。相反,由受监督方法学习的特征在被试之间更加均匀并更好地编码活动的性质。