The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on $m$ elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when $m$ is fixed to a constant (and the distance parameter $\varepsilon$ is the only variable). For the general case, our bounds are at most $O(\log m)$ apart. In particular, our results show a surprising $O(\log \varepsilon^{-1})$ gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a $O(\log m)$ gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.
翻译:暂无翻译