稀疏多任务回归的选择性推断及其在神经影像学中的应用 (Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging)

Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multi-task problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant covariates for each task through a sparsity-inducing penalty, and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Brain Cognitive Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.

翻译：多任务学习常常用于从相同的特征集合中模型化一组相关的响应变量，相较于处理每个响应变量的单任务方法，它可以提高预测性能和建模准确性。然而，虽然多任务学习具有比单任务学习更强的推断能力，但在这方面的先前研究往往忽略了不确定性计量。本文的重点是神经影像学中的一个常见多任务问题，即通过图像学来理解多个认知任务得分（或其他主观层面的评估）与大脑连接组数据之间的关系。我们提出了一个选择性推断框架来解决这个问题，通过使用稀疏性诱导惩罚共同识别每个任务的相关协变量，以及在基于估计的稀疏结构的模型中进行有效的推断。我们的框架提供了一种新的条件程序，用于基于一种选择事件的改进来得到可行的选择调整似然。这给出了一个适用于最大似然推断的近似估计方程系统，可以通过单个凸优化问题来解决，并且使我们能够以大致正确的覆盖程度高效地形成置信区间。在模拟数据和来自*青少年脑认知发展*（Adolescent Brain Cognitive Development，ABCD）研究的实际数据上应用我们的选择性推断方法，可以得到比常用替代方法（如数据分割）更紧的置信区间。我们还通过模拟表明，多任务学习与选择性推断可以比单任务方法更准确地恢复真实信号。