In the problem of active sequential hypotheses testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error $\delta>0$, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least $1 - \delta$. Motivated by applications in which the number of hypotheses or actions is massive (e.g. genomics-based cancer detection), we propose efficient (greedy, in fact) algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real DNA mutation data, demonstrating that our algorithms outperform previous heuristic policies by large margins.
翻译:在主动的连续假设测试(ASHT)问题中,学习者试图从已知的一系列假设中找出真实的假设。学习者得到一套行动,并知道在任何真实假设下任何行动的结果的随机分布。鉴于一个目标错误$\delta>0,目标是按顺序选择最少数的行动,以便确定真实的假设,概率至少为1美元-\delta$。受假设或行动数量庞大(例如基于基因组的癌症检测)的应用的驱动,我们提出高效的算法(事实上是基因组的),并在两种适应性下为ASHT提供第一种近似保证。我们两种保证都独立于行动的数量和假设数的对数。我们用合成和真实的DNA突变数据对算法的性能进行数字评估,表明我们的算法在很大的范围内超越了先前的超值政策。