Understanding the hidden mechanisms behind human's visual perception is a fundamental question in neuroscience. To that end, investigating into the neural responses of human mind activities, such as functional Magnetic Resonance Imaging (fMRI), has been a significant research vehicle. However, analyzing fMRI signals is challenging, costly, daunting, and demanding for professional training. Despite remarkable progress in fMRI analysis, existing approaches are limited to generating 2D images and far away from being biologically meaningful and practically useful. Under this insight, we propose to generate visually plausible and functionally more comprehensive 3D outputs decoded from brain signals, enabling more sophisticated modeling of fMRI data. Conceptually, we reformulate this task as a {\em fMRI conditioned 3D object generation} problem. We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject who was presented with a 2D image, and yields as output the corresponding 3D object images. The key capabilities of this model include tackling the noises with high-level semantic signals and a two-stage architecture design for progressive high-level information integration. Extensive experiments validate the superior capability of our model over previous state-of-the-art 3D object generation methods. Importantly, we show that our model captures the distinct functionalities of each region of human vision system as well as their intricate interplay relationships, aligning remarkably with the established discoveries in neuroscience. Further, preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios, such as V1, V2, V3, V4, and the medial temporal lobe (MTL) within the human visual system. Our data and code will be available at https://brain-3d.github.io/.
翻译:理解人类视觉感知背后的隐藏机制是神经科学中的一个基本问题。为此,研究人类心智活动的神经响应,例如功能磁共振成像(fMRI),已成为一项重要的研究手段。然而,分析fMRI信号具有挑战性、成本高昂、过程艰巨且需要专业训练。尽管fMRI分析已取得显著进展,但现有方法仅限于生成二维图像,远未达到具有生物学意义和实际应用价值的水平。基于这一认识,我们提出从大脑信号解码生成视觉上合理且功能上更全面的三维输出,以实现对fMRI数据更精细的建模。从概念上讲,我们将此任务重新表述为一个{\em fMRI条件化的三维物体生成}问题。我们设计了一种新颖的三维物体表示学习方法——Brain3D,该方法以被试观看二维图像时的fMRI数据作为输入,并输出对应的三维物体图像。该模型的关键能力包括:利用高层语义信号处理噪声,以及采用渐进式高层信息整合的两阶段架构设计。大量实验验证了我们的模型相较于先前最先进的三维物体生成方法的优越性能。重要的是,我们证明该模型能够捕捉人类视觉系统各区域(如V1、V2、V3、V4及内侧颞叶(MTL))的独特功能及其复杂的相互作用关系,这与神经科学领域的已有发现高度吻合。此外,初步评估表明,Brain3D能够在模拟场景中成功识别人类视觉系统内功能紊乱的脑区。我们的数据与代码将在https://brain-3d.github.io/ 公开。