We present improved algorithms and matching statistical and computational lower bounds for the problem of identity testing $n$-dimensional distributions. In the identity testing problem, we are given as input an explicit distribution $\mu$, an $\varepsilon>0$, and access to a sampling oracle for a hidden distribution $\pi$. The goal is to distinguish whether the two distributions $\mu$ and $\pi$ are identical or are at least $\varepsilon$-far apart. When there is only access to full samples from the hidden distribution $\pi$, it is known that exponentially many samples may be needed, and hence previous works have studied identity testing with additional access to various conditional sampling oracles. We consider here a significantly weaker conditional sampling oracle, called the Coordinate Oracle, and provide a fairly complete computational and statistical characterization of the identity testing problem in this new model. We prove that if an analytic property known as approximate tensorization of entropy holds for the visible distribution $\mu$, then there is an efficient identity testing algorithm for any hidden $\pi$ that uses $\tilde{O}(n/\varepsilon)$ queries to the Coordinate Oracle. Approximate tensorization of entropy is a classical tool for proving optimal mixing time bounds of Markov chains for high-dimensional distributions, and recently has been established for many families of distributions via spectral independence. We complement our algorithmic result for identity testing with a matching $\Omega(n/\varepsilon)$ statistical lower bound for the number of queries under the Coordinate Oracle. We also prove a computational phase transition: for sparse antiferromagnetic Ising models over $\{+1,-1\}^n$, in the regime where approximate tensorization of entropy fails, there is no efficient identity testing algorithm unless RP=NP.
翻译:我们为身份测试问题提供了更好的算法,匹配了统计和计算上较低的下限。 在身份测试问题中, 我们作为输入输入获得一个明确的分发 $mu$, $$\ varepsilon>0美元, 并获得一个隐藏分配 $\ pi$的取样或触摸。 目标是区分两种分配 $\ mu$ 和$\ pi$ 是否相同, 或者至少是 $\ varepsilon- far 。 当只有从隐藏分配 $\ pion 分布中获取完整样本时, 已知可能需要大量样本, 因此, 之前的工作已经研究过身份测试, 附加了各种有条件取样或触摸点。 我们在这里认为一个大大弱化的有条件取样或触摸点, 提供了相当完整的计算和统计测试这个新模式中的身份测试问题。 我们证明, 如果在可见的分发中, 一个被称为“ rentroprealal ” ystalal- systemilation, 那么, liveral deal deal deal demodrial demodalalal demotion rational demodaltistration1, liversal devidustrate: liver a liver demodal dal deal demotional demotional deal deal deal develmental demodal deal demodal demodal disal dismations dismlationslation, ladal deal deal deal develdal deal devial devial devial develdal develd lad lad lad lament, la la la la la la ladaldaldaldaldal deal de ladaldaldaldaldaldaldaldaldaldaldaldaldaldaldald lad lad lad lad lad lautdaldald lad lautd la la la la la la ladaldaldaldaldal de la la la la la la