Consider a scenario where a large number of explanatory features targeting a response variable are analyzed, such that these features are partitioned into different groups according to their domain-specific structures. Furthermore, there may be several such partitions. Such multiple partitions may exist in many real-life scenarios. One such example is spatial genome-wide association studies. Researchers may not only be interested in identifying the features relevant to the response but also aim to determine the relevant groups within each partition. A group is considered relevant if it contains at least one relevant feature. To ensure the replicability of the findings at various resolutions, it is essential to provide false discovery rate (FDR) control for findings at multiple layers simultaneously. This paper presents a general approach that leverages various existing controlled selection procedures to generate more stable results using multilayer FDR control. The key contributions of our proposal are the development of a generalized e-filter that provides multilayer FDR control and the construction of a specific type of generalized e-values to evaluate feature importance. A primary application of our method is an improved version of Data Splitting (DS), called the eDS-filter. Furthermore, we combine the eDS-filter with the version of the group knockoff filter (gKF), resulting in a more flexible approach called the eDS+gKF filter. Simulation studies demonstrate that the proposed methods effectively control the FDR at multiple levels while maintaining or even improving power compared to other approaches. Finally, we apply the proposed method to analyze HIV mutation data.
翻译:暂无翻译