We consider the problem of estimating the support size of a distribution $D$. Our investigations are pursued through the lens of distribution testing and seek to understand the power of conditional sampling (denoted as COND), wherein one is allowed to query the given distribution conditioned on an arbitrary subset $S$. The primary contribution of this work is to introduce a new approach to lower bounds for the COND model that relies on using powerful tools from information theory and communication complexity. Our approach allows us to obtain surprisingly strong lower bounds for the COND model and its extensions. 1) We bridge the longstanding gap between the upper ($O(\log \log n + \frac{1}{\epsilon^2})$) and the lower bound $\Omega(\sqrt{\log \log n})$ for COND model by providing a nearly matching lower bound. Surprisingly, we show that even if we get to know the actual probabilities along with COND samples, still $\Omega(\log \log n + \frac{1}{\epsilon^2 \log (1/\epsilon)})$ queries are necessary. 2) We obtain the first non-trivial lower bound for COND equipped with an additional oracle that reveals the conditional probabilities of the samples (to the best of our knowledge, this subsumes all of the models previously studied): in particular, we demonstrate that $\Omega(\log \log \log n + \frac{1}{\epsilon^2 \log (1/\epsilon)})$ queries are necessary.
翻译:我们考虑的是估算发行量 $D 的支持规模的问题。我们的调查是通过发行测试的透镜进行,并试图了解有条件抽样(称为COND)的力量,其中允许人们查询以任意子子美元为条件的指定发行量。这项工作的主要贡献是引入一种新的方法,降低使用信息理论和通信复杂度等强力工具支持发行量的模式的下限。我们的方法使我们得以为COND模型及其扩展获得惊人的强力下限。1 我们缩小了上层(O\log\log n+\frac{1\hepslon2})和下层(Omega) (sqrock_log\log\log\log n}) 之间的长期差距,为COND模型提供了近乎相近的下限。我们即使了解了与COND样本一起的实际概率, $Omega(log\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\