Masked reconstruction serves as a fundamental pretext task for self-supervised learning, enabling the model to enhance its feature extraction capabilities by reconstructing the masked segments from extensive unlabeled data. In human activity recognition, this pretext task employed a masking strategy centered on the time dimension. However, this masking strategy fails to fully exploit the inherent characteristics of wearable sensor data and overlooks the inter-channel information coupling, thereby limiting its potential as a powerful pretext task. To address these limitations, we propose a novel masking strategy called Channel Masking. It involves masking the sensor data along the channel dimension, thereby compelling the encoder to extract channel-related features while performing the masked reconstruction task. Moreover, Channel Masking can be seamlessly integrated with masking strategies along the time dimension, thereby motivating the self-supervised model to undertake the masked reconstruction task in both the time and channel dimensions. Integrated masking strategies are named Time-Channel Masking and Span-Channel Masking. Finally, we optimize the reconstruction loss function to incorporate the reconstruction loss in both the time and channel dimensions. We evaluate proposed masking strategies on three public datasets, and experimental results show that the proposed strategies outperform prior strategies in both self-supervised and semi-supervised scenarios.
翻译:暂无翻译