In the problem of active sequential hypothesis testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error $\delta>0$, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least $1 - \delta$. Motivated by applications in which the number of hypotheses or actions is massive (e.g., genomics-based cancer detection), we propose efficient (greedy, in fact) algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real-world DNA mutation data, demonstrating that our algorithms outperform previously proposed heuristic policies by large margins.
翻译:在积极的连续假设测试(ASHT)问题中,学习者试图从已知的一系列假设中找出真实的假设。学习者得到一套行动,并知道在任何真实假设下任何行动的结果的随机分布。考虑到一个目标错误 $\delta>0$,目标是按顺序选择最少量的行动,以便用概率至少1美元 -\delta$来识别真实的假设。受假设或行动的数量巨大的应用(例如基于基因组的癌症检测)的驱动,我们提出高效的算法(事实上是基因组的),并在两种适应性下为ASHT提供第一种近似保证。我们两种保证都独立于行动的数量和假设数的对数。我们用合成和现实世界DNA突变数据对算法的性能进行数字评估,表明我们的算法超越了先前提议的大边缘的超值政策。