This paper revisits the classical problem of determining the bias of a weighted coin, where the bias is known to be either $p = 1/2 + \varepsilon$ or $p = 1/2 - \varepsilon$, while minimizing the expected number of coin tosses and the error probability. The optimal strategy for this problem is given by Wald's Sequential Probability Ratio Test (SPRT), which compares the log-likelihood ratio against fixed thresholds to determine a stopping time. Classical proofs of this result typically rely on analytical, continuous, and non-constructive arguments. In this paper, we present a discrete, self-contained proof of the optimality of the SPRT for this problem. We model the problem as a biased random walk on the two-dimensional (heads, tails) integer lattice, and model strategies as marked stopping times on this lattice. Our proof takes a straightforward greedy approach, showing how any arbitrary strategy may be transformed into the optimal, parallel-line "difference policy" corresponding to the SPRT, via a sequence of local perturbations that improve a Bayes risk objective.
翻译:暂无翻译