The optimal receiver operating characteristic (ROC) curve, giving the maximum probability of detection as a function of the probability of false alarm, is a key information-theoretic indicato of the difficulty of a binary hypothesis testing problem (BHT). It is well known that the optimal ROC curve for a given BHT, corresponding to the likelihood ratio test, is determined by the probability distribution of the observed data under each of the two hypotheses. In some cases, these two distributions may be unknown or computationally intractable, but independent samples of the likelihood ratio can be observed. This raises the problem of estimating the optimal ROC for a BHT from such samples. The maximum likelihood estimator of the optimal ROC curve is derived, and it is shown to converge almost surely to the true optimal ROC curve in the Levy metric, as the number of observations tends to infinity. Finite sample size bounds are obtained for three other estimators: the classical empirical estimator, based on estimating the two types of error probabilities from two separate sets of samples, and two variations of the maximum likelihood estimator called the split estimator and fused estimator, respectively. The maximum likelihood estimator is observed in simulation experiments to be considerably more accurate than the empirical estimator, especially when the number of samples obtained under one of the two hypotheses is small. The area under the maximum likelihood estimator is derived; it is a consistent estimator of the area under the true optimal ROC curve.
翻译:暂无翻译