We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternatives. The central objective is to answer a query regarding the alternatives with high confidence while minimizing measurement efforts. One canonical example is identifying the best-performing alternative, a problem known as ranking and selection in simulation or best-arm identification in machine learning. We formulate the problem complexity measure as a maximin optimization problem for the static fixed-budget, fixed-confidence, and posterior convergence rate settings. By incorporating dual variables directly into the analysis, we derive necessary and sufficient conditions for an allocation's optimality. The introduction of dual variables allows us to sidestep the combinatorial complexity that arises when considering only primal variables. These optimality conditions enable the extension of the top-two algorithm design principle to more general pure-exploration problems. Moreover, our analysis yields a straightforward and effective information-directed selection rule that adaptively chooses from a candidate set based on the informational value of the candidates. We demonstrate the broad range of contexts in which our design principle can be implemented. In particular, when combined with information-directed selection, top-two Thompson sampling achieves asymptotic optimality in Gaussian best-arm identification, resolving a notable open question in the pure-exploration literature. Our algorithm attains optimality in $\varepsilon$-best-arm identification (or ranking and selection with a probability of good selection guarantee) and thresholding bandits. Our results provide a general principle for adapting Thompson sampling to general pure-exploration problems. Numerical experiments highlight the efficiency of our proposed algorithms compared to existing methods.
翻译:暂无翻译