Visual reconstruction algorithms are an interpretive tool that map brain activity to pixels. Past reconstruction algorithms employed brute-force search through a massive library to select candidate images that, when passed through an encoding model, accurately predict brain activity. Here, we use conditional generative diffusion models to extend and improve this search-based strategy. We decode a semantic descriptor from human brain activity (7T fMRI) in voxels across most of visual cortex, then use a diffusion model to sample a small library of images conditioned on this descriptor. We pass each sample through an encoding model, select the images that best predict brain activity, and then use these images to seed another library. We show that this process converges on high-quality reconstructions by refining low-level image details while preserving semantic content across iterations. Interestingly, the time-to-convergence differs systematically across visual cortex, suggesting a succinct new way to measure the diversity of representations across visual brain areas.
翻译:摘要:视觉重建算法是一种将脑活动映射到像素的解释工具。过去的重建算法通过巨大的图库进行暴力搜索,选择候选图像,然后通过编码模型,准确预测脑活动。在这里,我们使用条件生成扩散模型来扩展并改进这种基于搜索的策略。我们从覆盖大部分视觉皮层体素的人类大脑活动(7T fMRI)中解码语义描述符,然后使用扩散模型在此描述符上进行样本库的筛选。我们将每个样本通过编码模型,并选择最能准确预测脑活动的图像,然后使用这些图像对另一个库进行筛选。我们发现,这个过程通过在保留迭代中的语义内容的同时细化低级别的图像细节,从而收敛于高质量的重建结果。有趣的是,收敛的时间在视觉皮层不同区域之间有系统性区别,这意味着一种简明新的衡量视觉脑区表征多样性的方法。