We study the equivalence testing problem where the goal is to determine if the given two unknown distributions on $[n]$ are equal or $\epsilon$-far in the total variation distance in the conditional sampling model (CFGM, SICOMP16; CRS, SICOMP15) wherein a tester can get a sample from the distribution conditioned on any subset. Equivalence testing is a central problem in distribution testing, and there has been a plethora of work on this topic in various sampling models. Despite significant efforts over the years, there remains a gap in the current best-known upper bound of $\tilde{O}(\log \log n)$ [FJOPS, COLT 2015] and lower bound of $\Omega(\sqrt{\log \log n})$[ACK, RANDOM 2015, Theory of Computing 2018]. Closing this gap has been repeatedly posed as an open problem (listed as problems 66 and 87 at sublinear.info). In this paper, we completely resolve the query complexity of this problem by showing a lower bound of $\tilde{\Omega}(\log \log n)$. For that purpose, we develop a novel and generic proof technique that enables us to break the $\sqrt{\log \log n}$ barrier, not only for the equivalence testing problem but also for other distribution testing problems, such as uniblock property.
翻译:暂无翻译