将神经音频编解码器适配于脑电图 (Adapting Neural Audio Codecs to EEG)

EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting points for EEG compression, provided that the data are preprocessed to be suitable to the codec's input constraints. Using DAC, a state-of-the-art neural audio codec as our base, we demonstrate that raw EEG can be mapped into the codec's stride-based framing, enabling direct reuse of the audio-pretrained encoder-decoder. Even without modification, this setup yields stable EEG reconstructions, and fine-tuning on EEG data further improves fidelity and generalization compared to training from scratch. We systematically explore compression-quality trade-offs by varying residual codebook depth, codebook (vocabulary) size, and input sampling rate. To capture spatial dependencies across electrodes, we propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding, while retaining the audio-pretrained initialization. Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information, as reflected in spectrogram-based reconstruction loss and downstream classification accuracy.

翻译：脑电图与音频本质上是两种不同的模态，在采样率、通道结构和尺度上存在差异。然而，我们证明，只要对数据进行预处理以适应编解码器的输入约束，预训练的神经音频编解码器可作为脑电图压缩的有效起点。以最先进的神经音频编解码器DAC为基础，我们展示了原始脑电图可映射至编解码器的基于步长的帧结构，从而直接复用音频预训练的编码器-解码器。即使不作修改，该设置也能产生稳定的脑电图重建，且与从头训练相比，基于脑电图数据的微调进一步提升了保真度和泛化能力。我们通过改变残差码本深度、码本（词汇）大小和输入采样率，系统性地探索了压缩质量之间的权衡。为捕捉电极间的空间依赖性，我们提出了DAC-MC——一种多通道扩展模型，采用基于注意力的跨通道聚合和通道特定解码，同时保留音频预训练的初始化。在TUH异常脑电图和癫痫数据集上的评估表明，适配后的编解码器保留了临床相关信息，这通过基于频谱图的重建损失和下游分类准确率得以体现。