Physiological signals such as electrocardiograms (ECG) and electroencephalograms (EEG) provide complementary insights into human health and cognition, yet multi-modal integration is challenging due to limited multi-modal labeled data, and modality-specific differences . In this work, we adapt the CBraMod encoder for large-scale self-supervised ECG pretraining, introducing a dual-masking strategy to capture intra- and inter-lead dependencies. To overcome the above challenges, we utilize a pre-trained CBraMod encoder for EEG and pre-train a symmetric ECG encoder, equipping each modality with a rich foundational representation. These representations are then fused via simple embedding concatenation, allowing the classification head to learn cross-modal interactions, together enabling effective downstream learning despite limited multi-modal supervision. Evaluated on emotion recognition, our approach achieves near state-of-the-art performance, demonstrating that carefully designed physiological encoders, even with straightforward fusion, substantially improve downstream performance. These results highlight the potential of foundation-model approaches to harness the holistic nature of physiological signals, enabling scalable, label-efficient, and generalizable solutions for healthcare and affective computing.
翻译:心电图(ECG)和脑电图(EEG)等生理信号为人类健康与认知提供了互补性洞察,但由于多模态标注数据有限以及模态间特异性差异,多模态整合面临挑战。本研究采用CBraMod编码器进行大规模自监督ECG预训练,引入双重掩码策略以捕获导联内与导联间的依赖关系。为克服上述挑战,我们利用预训练的CBraMod编码器处理EEG信号,并预训练对称的ECG编码器,使各模态具备丰富的基础表征。这些表征通过简单的嵌入拼接进行融合,使分类头能够学习跨模态交互,从而在有限的多模态监督下实现有效的下游学习。在情绪识别任务上的评估表明,我们的方法达到了接近最先进的性能,证明精心设计的生理信号编码器即使采用直接融合方式,也能显著提升下游任务表现。这些结果凸显了基础模型方法在利用生理信号整体特性方面的潜力,为医疗健康与情感计算提供了可扩展、标签高效且泛化能力强的解决方案。