Recent studies demonstrate the use of a two-stage supervised framework to generate images that depict human perception to visual stimuli from EEG, referring to EEG-visual reconstruction. They are, however, unable to reproduce the exact visual stimulus, since it is the human-specified annotation of images, not their data, that determines what the synthesized images are. Moreover, synthesized images often suffer from noisy EEG encodings and unstable training of generative models, making them hard to recognize. Instead, we present a single-stage EEG-visual retrieval paradigm where data of two modalities are correlated, as opposed to their annotations, allowing us to recover the exact visual stimulus for an EEG clip. We maximize the mutual information between the EEG encoding and associated visual stimulus through optimization of a contrastive self-supervised objective, leading to two additional benefits. One, it enables EEG encodings to handle visual classes beyond seen ones during training, since learning is not directed at class annotations. In addition, the model is no longer required to generate every detail of the visual stimulus, but rather focuses on cross-modal alignment and retrieves images at the instance level, ensuring distinguishable model output. Empirical studies are conducted on the largest single-subject EEG dataset that measures brain activities evoked by image stimuli. We demonstrate the proposed approach completes an instance-level EEG-visual retrieval task which existing methods cannot. We also examine the implications of a range of EEG and visual encoder structures. Furthermore, for a mostly studied semantic-level EEG-visual classification task, despite not using class annotations, the proposed method outperforms state-of-the-art supervised EEG-visual reconstruction approaches, particularly on the capability of open class recognition.
翻译:最近的研究显示,使用一个两阶段监督框架来生成图像,描绘人类对EEEG视觉刺激的感官感知,指的是EEG的视觉重建。但是,这些图像无法复制精确的视觉刺激,因为人类对图像的描述,而不是数据,决定了合成图像是什么。此外,合成图像常常受到EEEG编码的吵闹和对基因化模型的不稳定培训的困扰,因此难以识别。相反,我们提出了一个单一阶段的EEEG视觉恢复模式,其中两种模式的数据与其说明相关,从而使我们能够恢复EEEEG剪动的精确视觉刺激。我们通过优化对比性自我监督的图像目标,最大限度地扩大EEEEEG编码和相关视觉刺激之间的相互信息,从而带来两个额外的好处。一,使EEEG编码能够处理在培训期间所看到的视觉课程之外的其他课程,因为学习不是针对课堂说明的。此外,我们不再需要该模型来生成视觉分类的每一细节,而是侧重于跨模式的对比性调整和检索图像的图像影响,确保EEEEEG的最先进的模型输出。