Foundation models have transformed AI by reducing reliance on task-specific data through large-scale pretraining. While successful in language and vision, their adoption in EEG has lagged due to the heterogeneity of public datasets, which are collected under varying protocols, devices, and electrode configurations. Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup, resulting in suboptimal performance, in particular under linear probing. We present REVE (Representation for EEG with Versatile Embeddings), a pretrained model explicitly designed to generalize across diverse EEG signals. REVE introduces a novel 4D positional encoding scheme that enables it to process signals of arbitrary length and electrode arrangement. Using a masked autoencoding objective, we pretrain REVE on over 60,000 hours of EEG data from 92 datasets spanning 25,000 subjects, representing the largest EEG pretraining effort to date. REVE achieves state-of-the-art results on 10 downstream EEG tasks, including motor imagery classification, seizure detection, sleep staging, cognitive load estimation, and emotion recognition. With little to no fine-tuning, it demonstrates strong generalization, and nuanced spatio-temporal modeling. We release code, pretrained weights, and tutorials to support standardized EEG research and accelerate progress in clinical neuroscience.
翻译:基础模型通过大规模预训练减少了对任务特定数据的依赖,从而改变了人工智能领域。尽管在语言和视觉领域取得了成功,但由于公共脑电图数据集存在异质性(这些数据是在不同协议、设备和电极配置下采集的),基础模型在脑电图领域的应用一直滞后。现有的脑电图基础模型难以泛化到这些变化,通常将预训练限制在单一配置下,导致性能欠佳,尤其是在线性探测任务中。我们提出了REVE(具有通用嵌入的脑电图表征模型),这是一种经过预训练的模型,专门设计用于泛化处理多样化的脑电图信号。REVE引入了一种新颖的4D位置编码方案,使其能够处理任意长度和任意电极排列的信号。通过使用掩码自编码目标,我们在来自92个数据集、涵盖25,000名受试者的超过60,000小时的脑电图数据上对REVE进行了预训练,这是迄今为止规模最大的脑电图预训练工作。REVE在10项下游脑电图任务上取得了最先进的结果,包括运动想象分类、癫痫发作检测、睡眠分期、认知负荷估计和情绪识别。在极少甚至无需微调的情况下,它展现出强大的泛化能力和精细的时空建模能力。我们发布了代码、预训练权重和教程,以支持标准化的脑电图研究,并加速临床神经科学的进展。