Visual reconstruction algorithms are an interpretive tool that map brain activity to pixels. Past reconstruction algorithms employed brute-force search through a massive library to select candidate images that, when passed through an encoding model, accurately predict brain activity. Here, we use conditional generative diffusion models to extend and improve this search-based strategy. We decode a semantic descriptor from human brain activity (7T fMRI) in voxels across most of visual cortex, then use a diffusion model to sample a small library of images conditioned on this descriptor. We pass each sample through an encoding model, select the images that best predict brain activity, and then use these images to seed another library. We show that this process converges on high-quality reconstructions by refining low-level image details while preserving semantic content across iterations. Interestingly, the time-to-convergence differs systematically across visual cortex, suggesting a succinct new way to measure the diversity of representations across visual brain areas.
翻译:摘要:视觉重建算法是一种将大脑活动映射到像素的解释工具。过去的重建算法通过搜索一个巨大的库来选择候选图像,该图像通过编码模型时能准确地预测大脑活动。在这里,我们使用条件生成扩散模型来扩展和改进这种搜索策略。我们在视觉皮层中跨越大多数体素解码人类大脑活动(7 T fMRI)中的语义描述符,然后使用扩散模型对一小库图像进行有条件的抽样。我们将每个样本通过编码模型,选择最能准确预测大脑活动的图像,然后使用这些图像来生成另一个库。我们展示了这个过程通过在保留语义内容的同时,优化低级别的图像细节收敛到高质量的重建结果。有趣的是,收敛时间在视觉皮质中有系统的差异,这表明了衡量不同视觉脑区表征多样性的一个简洁新方法。