Consider the problem of testing $Z \sim \mathbb P^{\otimes m}$ vs $Z \sim \mathbb Q^{\otimes m}$ from $m$ samples. Generally, to achieve a small error rate it is necessary and sufficient to have $m \asymp 1/\epsilon^2$, where $\epsilon$ measures the separation between $\mathbb P$ and $\mathbb Q$ in total variation ($\mathsf{TV}$). Achieving this, however, requires complete knowledge of the distributions $\mathbb P$ and $\mathbb Q$ and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to $\mathbb P$ and $\mathbb Q$ (which are a priori only known to belong to a large non-parametric family $\mathcal P$) is given through $n$ i.i.d. samples from each. We demostrate existence of a fundamental trade-off between $n$ and $m$ given by $nm \asymp n^2_\mathsf{GoF}(\epsilon,\mathcal P)$, where $n_\mathsf{GoF}$ is the minimax sample complexity of testing between the hypotheses $H_0: \mathbb P= \mathbb Q$ vs $H_1: \mathsf{TV}(\mathbb P,\mathbb Q) \ge \epsilon$. We show this for three non-parametric families $\cal P$: $\beta$-smooth densities over $[0,1]^d$, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions $\mathcal P$ on a large alphabet $[k]$ with pmfs bounded by $c/k$ for fixed $c$. The test that we propose (based on the $L^2$-distance statistic of Ingster) simultaneously achieves all points on the tradeoff curve for these families. In particular, when $m\gg 1/\epsilon^2$ our test requires the number of simulation samples $n$ to be orders of magnitude smaller than what is needed for density estimation with accuracy $\asymp \epsilon$ (under $\mathsf{TV}$). This demonstrates the possibility of testing without fully estimating the distributions.
翻译:考虑从美元样本中测试 $Z 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= 美元= = 美元= 美元= 美元= 美元= 美元= 美元= = 美元= 美元= 美元= = 美元= 美元= = 美元= 美元= = 美元= = = 美元= = 美元= 美元= = 美元= = 美元= 美元= = 美元= = = 美元= = 美元= = 美元= = = 美元= 美元= = 美元= 美元= = = = = = =