Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-level covariates. We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among test-level covariates. Our method parametrizes the test-level covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines.
翻译:控制假发现率(FDR),同时利用多种假设测试的侧面信息,是现代数据科学中新出现的研究课题。现有方法依靠试验级共变法,而忽略了共同变法之间可能的等级。这一战略可能不是解决复杂的大规模问题的最佳办法,因为在这些测试级共变法中,往往存在等级信息。我们提议NeurT-FDR为多种假设测试提供统计力量和控制FDR,同时利用试验级共变法之间的等级。我们的方法对测试级共变法作为神经网络,并通过回归框架调整特征等级,以便能够灵活地处理高维特征以及高效的端到端优化。我们表明NeurT-FDR具有强大的FDR保证力,并比竞争性基线在合成和真实数据集中进行更多的发现。