A key question for adapting modern deep learning architectures to functional MRI (fMRI) is how to represent the data for model input. To bridge the modality gap between fMRI and natural images, we transform the 4D volumetric fMRI data into videos of 2D fMRI activity flat maps. We train Vision Transformers on 2.3K hours of fMRI flat map videos from the Human Connectome Project using the spatiotemporal masked autoencoder (MAE) framework. We observe that masked fMRI modeling performance improves with dataset size according to a strict power scaling law. Downstream classification benchmarks show that our model learns rich representations supporting both fine-grained state decoding across subjects, as well as subject-specific trait decoding across changes in brain state. This work is part of an ongoing open science project to build foundation models for fMRI data. Our code and datasets are available at https://github.com/MedARC-AI/fmri-fm.
翻译:将现代深度学习架构适配于功能磁共振成像(fMRI)的一个核心问题是如何为模型输入表示数据。为弥合fMRI与自然图像之间的模态差异,我们将四维体素fMRI数据转换为二维fMRI活动扁平映射的视频序列。基于人脑连接组计划提供的2.3千小时fMRI扁平映射视频数据,我们使用时序掩码自编码器(MAE)框架训练视觉Transformer模型。研究发现,掩码fMRI建模性能随数据集规模呈现严格幂律标度关系提升。下游分类基准测试表明,该模型学习的丰富表征同时支持跨被试的细粒度状态解码,以及跨脑状态变化的被试特异性特质解码。本研究是构建fMRI数据基础模型的持续开放科学项目组成部分。代码与数据集已公开于https://github.com/MedARC-AI/fmri-fm。