控制结构差异中的虚假发现率:拆分取舍 (Controlling the False Discovery Rate in Structural Sparsity: Split Knockoffs)

Controlling the False Discovery Rate (FDR) in a variable selection procedure is critical for reproducible discoveries, which received an extensive study in sparse linear models. However, in many scenarios, such a sparsity constraint is not directly imposed on the parameters, but on a linear transformation of the parameters to be estimated. Examples can be found in total variations, wavelet transforms, fused LASSO, and trend filtering, etc. In this paper, we proposed a data adaptive FDR control in this structural sparsity setting, the Split Knockoff method. The proposed scheme relaxes the linear subspace constraint to its neighborhood, often known as variable splitting in optimization, that enjoys new statistical benefits. It yields orthogonal design and split knockoff matrices, that exhibit desired FDR control empirically in structural sparsity discovery, and improve the power of selecting strong features by enhancing the incoherence condition for model selection consistency. Yet, the split knockoff statistics fail to satisfy the exchangeability, a crucial property in the classical knockoff method for provable FDR control. To address this challenge, we introduce an almost supermartingale construction under a perturbation of exchangeability, that enables us to establish FDR control up to an arbitrarily small inflation that vanishes as the relaxed neighborhood enlarges. Simulation experiments show the effectiveness of split knockoffs with possible improvements over knockoffs in both FDR control and Power. An application to Alzheimer's Disease study with MRI data demonstrates that the split knockoff method can disclose important lesion regions in brains associated with the disease and connections between neighboring regions of high contrast variations during disease progression.

翻译：在一个可变的选择程序中控制假发现率(FDR)对于可复制发现至关重要,该发现在稀少的线性模型中得到了广泛的研究。然而,在许多情形中,这种聚变限制不是直接强加给参数,而是直接强加给要估计参数的线性转换。例子包括总变异、波子变换、Federate LASSO和趋势过滤等。在本文中,我们建议在这个结构性聚变设置中采用数据适应性FDR控制,即 Split Knoff 方法。拟议办法将线性子空间限制放松到其周围,通常被称为在优化中的可变分化分流,享有新的统计效益。它产生或分流设计和分流矩阵,显示FDR想要在结构性扰动发现中以实验方式控制,提高选择强性特征的力量。然而,分流数据统计无法满足互换性,这是用于可调 FDRDR控制的经典淘汰方法中的关键属性。为了应对这一挑战,我们引入了一种几乎超额的分流性分流的分流,在快速的分流关系中,让我们在快速的分流式的分流法中进行快速分析。