Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multi-task problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant covariates for each task through a sparsity-inducing penalty, and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Cognitive Brain Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.
翻译:多任务学习经常用于从同一组特征中模拟一套相关响应变量,改进预测性表现和相对于处理每种响应变量的方法的模型精确度。尽管多任务学习有可能产生比单任务替代方法更强有力的推论,但先前在这一领域的工作基本上忽略了不确定性的量化。本文件的重点是神经成形中一个常见的多任务问题,目的是了解多重认知任务分数(或其他主题级评估)和从成像中收集的大脑连接数据之间的关系。我们建议了一个有选择地推断解决这一问题的框架,其灵活性如下:(一)通过宽度诱导罚款,共同确定每项任务的相关共变体,以及(二)在基于估计偏度结构的模型中进行有效的推论。我们的框架提供了一个新的判断误差条件程序,其依据是对选择事件进行精细的改进,从而产生可移动的选择-调整时间间隔的可能性。我们用一个估计方程式来估计最大可能性,通过单一的直线度CD,通过单一直径的直径分析覆盖范围,共同确定每项任务的相关变数。我们用一个更精确的精确的分数,通过一个更精确的分级的分级的分级分析方法,通过一个更精确地模拟的分级的分解数据,从我们用来纠正数据,从一个最精确的分级的分级的分级的分级数据,从一个最精确的分级的分级的分级的分级的分级的分数。