This paper presents a novel method to make statistical inferences for both the model support and regression coefficients in a high-dimensional logistic regression model. Our method is based on the repro samples framework, in which we conduct statistical inference by generating artificial samples mimicking the actual data-generating process. The proposed method has two major advantages. Firstly, for model support, we introduce the first method for constructing model confidence set in a high-dimensional setting and the proposed method only requires a weak signal strength assumption. Secondly, in terms of regression coefficients, we establish confidence sets for any group of linear combinations of regression coefficients. Our simulation results demonstrate that the proposed method produces valid and small model confidence sets and achieves better coverage for regression coefficients than the state-of-the-art debiasing methods. Additionally, we analyze single-cell RNA-seq data on the immune response. Besides identifying genes previously proved as relevant in the literature, our method also discovers a significant gene that has not been studied before, revealing a potential new direction in understanding cellular immune response mechanisms.
翻译:暂无翻译