脑困难:利用基因潜在扩散,利用FMRI信号进行自然场景重建</s> (Brain-Diffuser: Natural scene reconstruction from fMRI signals using generative latent diffusion)

In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called ``Brain-Diffuser''. In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling ``ROI-optimal'' scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.

翻译：在神经解码研究中,最令人感兴趣的一个专题是重建基于 FMRI 信号的感知的自然图象。以前的研究成功地重建了视觉图象的不同方面, 如低层次属性( 形状、纹理、版式) 或高层次特征( 对象类别、描述图像的语义), 但通常没有将这些属性重建为复杂的场景图像。生成的AI 最近利用了能够生成高复杂图像的隐性扩散模型。在这里, 我们调查了如何利用这种创新技术进行大脑解码的神经图象。我们展示了一个两阶段的舞台图象重建框架, 叫做“ 灯光- 数字用户 ” 。在第一阶段, 从 FMRI 信号开始, 我们利用 VDVAE ( 精深自动图解) 模型和图像总体布局。在第二阶段, 我们利用了一种深层次化模型到模拟的图像框架( Veratifical) ( Vifical- reality) 界面( 和视觉) 在预测的多式联运( 和视觉) 两种模型中, 都已经应用了一种经过精细化的图像模型, 以生成的图像模型, 以生成的原始模型为原始模型和智能模型生成。</s>