High-dimensional logistic regression is widely used in analyzing data with binary outcomes. In this paper, global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings. A test statistic for testing the global null hypothesis is constructed using a generalized low-dimensional projection for bias correction and its asymptotic null distribution is derived. A lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range. For testing the individual coefficients simultaneously, multiple testing procedures are proposed and shown to control the false discovery rate (FDR) and falsely discovered variables (FDV) asymptotically. Simulation studies are carried out to examine the numerical performance of the proposed tests and their superiority over existing methods. The testing procedures are also illustrated by analyzing a data set of a metabolomics study that investigates the association between fecal metabolites and pediatric Crohn's disease and the effects of treatment on such associations.
翻译:在分析具有二元结果的数据时,广泛使用高维后勤回归法;在本文中,在单一和两向回归环境下都考虑对回归系数进行全球测试和大规模多重测试;使用普遍低维预测的偏差校正及其无线分布法,构建了测试全球空虚假设的测试统计数据;确定了全球测试的下限,表明拟议的测试在微缩最大程度上超过了某些聚变范围;为同时测试单个系数,提议并演示了多个测试程序,以控制虚假发现率(FDR)和不实发现的变量(FDV),并进行了模拟研究,以检查拟议测试的数值性能及其优于现有方法;还分析了一套用于调查胎儿代谢物与儿科克伦氏病之间关联以及治疗对此类联系的影响的代谢研究数据集,以此来说明测试程序。