Large language models (LLMs) have revolutionized human-machine interaction, and have been extended by embedding diverse modalities such as images into a shared language space. Yet, neural decoding has remained constrained by static, non-interactive methods. We introduce CorText, a framework that integrates neural activity directly into the latent space of an LLM, enabling open-ended, natural language interaction with brain data. Trained on fMRI data recorded during viewing of natural scenes, CorText generates accurate image captions and can answer more detailed questions better than controls, while having access to neural data only. We showcase that CorText achieves zero-shot generalization beyond semantic categories seen during training. In-silico microstimulation experiments, which enable counterfactual prompts on brain activity, reveal a consistent, and graded mapping between brain-state and language output. These advances mark a shift from passive decoding toward generative, flexible interfaces between brain activity and language.
翻译:大型语言模型(LLM)已彻底改变了人机交互,并通过将图像等多种模态嵌入共享语言空间而得到扩展。然而,神经解码仍受限于静态、非交互式的方法。我们提出CorText框架,该框架将神经活动直接整合到LLM的潜在空间中,实现了与脑数据的开放式自然语言交互。CorText基于观看自然场景时记录的fMRI数据进行训练,仅依靠神经数据即可生成准确的图像描述,并能比对照组更好地回答更详细的问题。我们证明CorText能够实现训练时未见语义类别的零样本泛化。硅内微刺激实验支持对脑活动进行反事实提示,揭示了脑状态与语言输出之间一致且分级的映射关系。这些进展标志着从被动解码向脑活动与语言之间生成式、灵活接口的转变。