Estimating the support size of a distribution is a well-studied problem in statistics. Motivated by the fact that this problem is highly non-robust (as small perturbations in the distributions can drastically affect the support size) and thus hard to estimate, Goldreich [ECCC 2019] studied the query complexity of estimating the $\epsilon$-\emph{effective support size} $\text{Ess}_\epsilon$ of a distribution ${P}$, which is equal to the smallest support size of a distribution that is $\epsilon$-far in total variation distance from ${P}$. In his paper, he shows an algorithm in the dual access setting (where we may both receive random samples and query the sampling probability $p(x)$ for any $x$) for a bicriteria approximation, giving an answer in $[\text{Ess}_{(1+\beta)\epsilon},(1+\gamma) \text{Ess}_{\epsilon}]$ for some values $\beta, \gamma > 0$. However, his algorithm has either super-constant query complexity in the support size or super-constant approximation ratio $1+\gamma = \omega(1)$. He then asked if this is necessary, or if it is possible to get a constant-factor approximation in a number of queries independent of the support size. We answer his question by showing that not only is complexity independent of $n$ possible for $\gamma>0$, but also for $\gamma=0$, that is, that the bicriteria relaxation is not necessary. Specifically, we show an algorithm with query complexity $O(\frac{1}{\beta^3 \epsilon^3})$. That is, for any $0 < \epsilon, \beta < 1$, we output in this complexity a number $\tilde{n} \in [\text{Ess}_{(1+\beta)\epsilon},\text{Ess}_\epsilon]$. We also show that it is possible to solve the approximate version with approximation ratio $1+\gamma$ in complexity $O\left(\frac{1}{\beta^2 \epsilon} + \frac{1}{\beta \epsilon \gamma^2}\right)$. Our algorithm is very simple, and has $4$ short lines of pseudocode.
翻译:估算发行量的支援规模是一个独立的复杂度问题 。 受以下事实的启发: 这个问题非常不易碎 (因为发行量中的小扰动会大大影响支持规模), 因而很难估算, Goldreich [ECCC 2019] 研究了估算美元和美元( 有效支持规模) 的质疑复杂性 $ 的分发量 {P} 美元, 这相当于 美元和美元之间的最小支持规模, 美元和美元之间的完全变异距离 $ = = = 美元 ; 美元 = = = 美元 ; 美元 = = = = 美元 ; 美元 = = = 美元 ; 美元 = = = = = 美元 ; 美元 = = = = = = = 美元( =x) 的抽样概率 $p (x) ; 以 = = = = = = = = = = 或 以 以 = 美元为 y 。