We consider the problem of sequential multiple hypothesis testing with nontrivial data collection cost. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes in a disease process. This work builds on the generalized $\alpha$-investing framework that enables control of the false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of $\alpha$-wealth which motivates a consideration of sample size in the $\alpha$-investing decision rule. Using the game theoretic principle of indifference, we construct a decision rule that optimizes the expected return (ERO) of $\alpha$-wealth and provides an optimal sample size for the test. We show empirical results that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to hedge against the risk of unproductive tests. Finally, empirical tests on a real data set from a biological experiment show that cost-aware ERO produces actionable decisions as to which tests to conduct and if so at what sample size.
翻译:我们用非三重数据收集成本来考虑连续多重假设测试的问题。 例如,在进行生物实验以确定疾病过程中不同表现的基因时,这一问题就显得非常明显。 这项工作建立在通用的alpha$- Investment框架的基础上,该框架使得能够在顺序测试环境中控制虚假的发现率。 我们从理论上分析长期的无症状行为($alpha$-wealth),这促使人们考虑在$alpha$-investive决定规则中抽样规模。 使用漠不关心的游戏理论原则, 我们构建了一个决策规则, 优化alpha$- wealth的预期回报( ERO), 并为测试提供一个最佳的样本大小。 我们展示了这样的实验结果,即成本意识的ERO决定正确地拒绝比其他方法更多的虚假的无缺陷。 我们将成本意识ERO投资推广到有限-horizon测试, 从而使得决定能够避免非生产性测试的风险。 最后, 我们从生物实验中得出的实际数据显示, 成本意识ERO产生可操作的样本, 如果进行测试,那么,那么,那么,那么,那么,那么的测试,那么,那么,什么实际数据将可以进行操作。